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ON THE IMPROVEMENT IN INTELLIGENCE 
SCORES FROM THIRTEEN TO NINETEEN! 


EDWARD L. THORNDIK£& 
Institute of Educational Research, Teachers College, Columbia University 


In an earlier paper with a similar title (Journal of Educational 
Psychology, Dec. ’23, Vol. XIV, pp. 513 to 516), it was shown that 
pupils in Grades IX to XI, when retested a year later, showed a sub- 
stantial balance of improvement above what was due to the practice 
effect in the repeated test. It was also shown that the gain was very 
closely the same for pupils in Grade IX, in Grade X and in Grade XI, 
any decrease in gain with age being offset by the selection for continu- 
ance in high-school of those more capable of gain. 

It is the purpose of the present report to discover the influence of 
age when freed from the possible influence of selection. The test 
used is the same as that described in the earlier paper, and the subjects 
are a random selection of about two-thirds from those described there. 
The method is to compare the gains of a certain group of 13-year-olds 
found in Grade IX in June ’22 who were in high school in the same city 
a year later, with the gains of a group of 14-year-olds who represent 
approximately the same selection from all 14-year-olds as the 13-year- 
olds do from all 13-year-olds. The 14-year-olds taken are those who 
were found in Grade X in June ’22, and were in high school in the same 
city in June ’23. These are a slightly better selection than the 13- 
year-olds in respect of ability to improve in such tests, but not much so, 
the correlation between status and improvement being only about .1; 
and the difference in status between the 14-year-olds in Grade X and 
what the 13-year-olds in Grade IX would be in a year being not very 





1 The investigation reported in this article was made possible by a grant from 
the Carnegie Corporation. 
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great. The same argument applies to the gains of pupils 14in June’22 
in Grade IX who stayed a year longer in high school, compared with 
the gains of pupils 15 in June ’22 in Grade X, who also stayed a year 
longer. Similarly for the gains of pupils 15 in Grade X and those of 
pupils 16 in Grade XI, and so on. 

Table I presents the median amount of gain for each of the groups. 
Table II, which is derived from Table I, presents the results of such 
comparisons as have just been described, using four groups, boys in 
city A, girls in city A, boys in city B, and girls in city B. Each entry 
of Table II is a measure of the superiority in gain of the younger group 
of the two. Even a casual inspection of these tables reveals that, in 
general, the younger group gains little more than the comparable group 
a year older, or two years older. 

We may check the effect of the greater ability (due to the slightly 
superior selection) to gain of the older groups in these pairs compared 
to what the younger groups would probably have at the later age, by 
comparing the 13-year-olds in Grade IX with the 16-year-olds in Grade 
XI or even with the 17-year-olds in Grade XI. The 16-year-olds in 
Grade XI probably, and the 17-year-olds in Grade XI surely, are less 
able to gain in such tests than the 13-year-olds in Grade IX will be 
when they become 16 or 17 respectively. In the same way, the 14- 
year-olds in Grade [IX may be compared with the 17-year olds in Grade 
XI and with the 18-year-olds in Grade XI; and the 15-year-olds in 
Grade IX with the 18-year-olds in Grade XI. The results appear in 
Table III. 

As a reasonable lower-limit estimate for the superiority due to a 
difference of two years; we may average the determinations of Table 
II and the first, third, and fifth of Table III. This gives: 


Superiority of gain from 13 to 14 over gain from 15 to 16........ .10 
Superiority of gain from 14 to 15 over gain from 16 to 17....... — .90 
Superiority of gain from 15 to 16 over gain from 17 to 18....... — .20 
Superiority of gain from 16 to 17 over gain from 18 to 19........ .30 


As a reasonable upper limit we may use the first, third, and fifth 
determinations of Table III and raise the superiority of 16 in IX over 
18 in XI by a corresponding amount. This gives: 





1 Just how great this difference is cannot be measured from any data now 
available. It cannot, however, be very great, for the first year of high school is 
the year of heavy elimination, and common observation teaches us that for a 
person to be a high school sophomore at 14 is not a much greater sign of ability 
than to be a freshman at 13. 
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Improvement in Intelligence Scores 75 


Superiority of gain from 13 to 14 over gain from 15 to 16....... 55 
Superiority of gain from 14 to 15 over gain from 16 to 17....... .20 
Superiority of gain from 15 to 16 over gain from 17 to 18....... 95 
Superiority of gain from 16 to 17 over gain from 18 to 19....... 1.07 


On the whole it appears that if the 13- and 14-year-olds could 
all have been re-measured year after year till 18, with due allowance 
for practice, their gains would have been very little less in the last year 
than in the first. Calling the gain from 13 to 14, 100, the annual gains 
from 14 to 15, 15 to 16, and so on could hardly be less than 95, 90, 85, 
80, and 75. Nor could anybody really prove from the facts at hand 
any decrease in gain. 

Furthermore, the groups whose members were 16 in Grade IX may 
reasonably be taken as representing a group near the average intellect 
of all 16-year-olds in these cities. They actually made substantial 
gains during the year, and there is no evidence that they would not 
have done so in the two years following, or that they made any greater 
gains when they were passing from 13 to 14, or 14 to 15, or 15 to 16. 
TaBLE I.—Tue Mepran Gains, In THE JER Test or SELEcTIVE AND RELA- 

TIONAL THINKING, GENERALIZATION AND ORGANISATION AFTER SUBTRACTION 


or 11.9 ror Practice Errect,' 1n THe Case or Various Groups MEaAs- 
URED IN JUNE ’22 aND JUNE ’23 


The Numbers in Parentheses Represent the Number of Individuals in the Group 
in Question 





13 in IX 14 in X 15 in XI 14 in IX 15 in X 16 in XI 











re aoe 25.6 (45)| 13.5 (88)| 9.1 (67)| 7.9 (103)| 14.6 (212)| 9.3 (194) 
‘e jie 12.9 (55)| 13.4 (102)| 9.7 (90)| 10.9 (158)| 8.4 (201)| 9.4 (211) 
B, Boys.......... 14.8 (175) | 15.6 (123) | 21.6 (79)| 12.4 (469)| 15.2 (385) | 16.6 (294) 
B, Gisle........0+. 9.3 (182)| 9.9 (154)| 13.2 (124)| 6.8 (570)| 13.3 (532)! 9.5 (421) 
elie Fes 2 13.85 13.45 | 10.45 9.4 13.95 9.45 








15 in IX 16 in X 17 in XI 16 in IX 17 in X 18 in XI 


























A, BOMB... csccces 11.7 (96); 6.8 (145)| 5.8 (116)| 17.0 (35)| 1.3 (81); 4.1 (51) 
A, GERD... cccvccce 10.4 (129)| 7.6 (172)| 11.8 (151)| 11.6 (17)| 8.6 (34)/| 15.4 (29) 
B, Baye... 0.080 9.8 (290) | 12.8 (319) | 11.1 (238); 8.4 (110)| 12.3 (99)/| 10.6 (81) 
B, Gishe........00 7.3 (391)| 6.4 (327)| 11.6 (306)| .7.4 (99); 6.7 (99); 4.6 (97) 
MeGOR ....ccseces 10.1 7.2 11.35 10.0 7.65 7.6 





1 More exactly, 9.4 for individuals who were tested with Form B first and 
Form A second, 14.4 from individuals who were tested with Form A first and 
Form B second. 
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The doctrine that the ability to improve one’s score in a measure of 
intelligence necessarily ceases at 14 or at 16, then, should be abandoned. 
Indeed, there seems to be evidence that this ability improves, at least 
in the case of those who are subject to intellectual education, beyond 18. 


TaBLE I].—TueE SuPERIORITY OF THE YOUNGEP GROUPS OVER THE COMPARABLE 
OLDER GrRouPs IN AMOUNT OF GAIN 


Differences of One Year 





















































13in IX|14 in X/14in IX|15 in X|15in IX/16 in X{16in IX/17 in X 
over over over over over over over over 
(14 in X/15in oan ie in X/16in XI)16 in X17 in KE 17 in X|18in XI 
A, Boys..... 2.1| 44|]-6.7)] 5.3| 4.9] 1.0] 15.7] —2.8 
A, Girls..... — 5 3.4 2.5} —1.0 2.8 | —4.2 3.0 | —6.8 
B, BOys..... — .&§&! —6.0} —2.8 | —1.4) —3.0 1.7 |-— 3.9 ey 
B, Girls... .. — 6] —3.3 | —6.5 3.8 9 | —5.2 be 2.1 
Median...... — .55| + .2 | —4.65| +1.4 +1.85| —1.6 |+ 1.85) — .35 
Differences of Two Years 
l 
13 in IX 14in IX 15in IX 16in IX 
over over over over 
15in XI 16in XI 17 in IX 18in IX 
A, Boys..... 6.5 - 5.9 12.9 
A, Girls..... 3.2 —1.4 — 3.8 
B, Boys..... —6.8 | —&. -1.3 — 2.2 
B, Girls..... —3.9 | -2.7 $38 2.8 
| } - 
Median...... — .35 —2.05 —1.35 + .30 





























TaBLE IJJ.—TueE SUPERIORITY OF THE YOUNGER GROUPS OVER OLDER GROUPS 
Wuicu REPRESENT PROBABLY INFERIOR SELECTIONS FROM WHAT THE 
YOUNGER Groups Witt Be Two Years LATER 














13in IX | 13in IX | 14in IX | 14in IX | 15in IX 
over over over over over 
16 in XI | 17in XI | 17 in XI 18 in IX | 18in IX 
6.3 9.8 1.9 3.8 7.6 
2.9 1.1 — 9 —4.5 —5.0 
—1.8 3.7 1.3 1.8 — .& 
— .2 —2.3 —4.8 3.3 2.7 
Ne. ok ie bene Oa .55 2.4 wa 2.0 .95 
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THE INVALIDITY OF THE INSPECTIONAL METHOD 
OF RANKING RADIOGRAPHS OF THE CARPAL 
AREA OF THE WRISTS 


JOSEPH C. MCELHANNON 


Professor of Education, Baylor College 


A careful examination of current professional literature reveals 
the fact that little is being done by way of experimentation to deter- 
mine the relation of anatomical growth to educational achievement. 
That which has been done has not been as scientific as it should be, 
or has dealt with too few cases. In the former cases, the measuring 
instruments probably were not exact, or certain factors were not taken 
into consideration and, in the latter, the conclusions drawn are 
nullified just as soon as more cases are studied. In many instances 
the inspectional method has been resorted to. It is the purpose of 
this article to show that the inspectional method is inadequate when 
compared with results arrived at scientifically. 

In 1921, the University of Chicago Department of Education 
began the study of the ossification of the carpal area of the wrist in 
pupils who entered the Laboratory Schools. It was proposed to 
measure each child on his birthday, or as near to it as possible for a 
period of ten years. From these data, it was proposed to make an 
exact study of the significance of anatomical development to the 
growth and education of school children. In 1922, Mr. T. M. Carter 
of the School of Education began a study of the carpal area for the 
purpose of setting up norms for the various ages and sexes.! To over- 
come an uncontrolled factor found in the variableness in the size of 
bones, Mr. Carter devised a quadrilateral for the purpose of controlling 
that factor. 

In order to make a more exact allowance for the size of the hand, 
which is considerably larger in some children than in others of the 
same age, a quadrilateral was drawn from the proximal points of the 
radius, the ulna, the fifth metacarpal, and the proximus epiphysis of 
the first metacarpal. By adding the real areas of each carpal bone and 
dividing the total by the whole area of the quadrilateral, a more exact 
degree of the state of the ossification of the carpal area could be deter- 





1 Freeman, F. N. and Carter, T. M.: A New Measure of the Development of 
the Carpal Bones and Its Relationship to Physical and Mental Development. 
Journal Educational Psychology, 15, 257-270. 
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mined than by simply taking the total area of the bones as an index 
of anatomical development. The factor of large and small boned 
children in this was overcome and the ratio of the bone area to the 
quadrilateral area was thought to be considerably more significant. 

In this experiment, the radiographs were studied in groups of 
twenty. The group was made up according to age and according 
to sex; for example, twenty seven-year-old boys or twenty ten-year-old 
girls. The radiographs of each of these age and sex groups were 
arranged in order from the least developed to the best developed. 
Accordingly, three strings were stretched one above the other across 
a window. The films of the least developed were placed on the top 
string, the best developed on the bottom string, and the medium 
developed on the middle string. Then, beginning at the extreme 
right on the top string, the radiographs were ranked according to the 
state of development. Next, the radiographs on the bottom string 
were ranked from the left to the right in order from less to greater 
development. This ranking was done for the purpose of demonstra- 
ting the comparative degree of success attained in ranking the radio- 
graphs inspectionally with the planimeter measured carpal areas. 

The experiment was undertaken with the utmost care, but on the 
assumption that a high degree or correlation would be found, so that 
prejudices and preconceived judgments would not have an undue 
weight in the results. The rank-difference method was used in the 
correlations, because it was a more rapid method of computation with 
a few cases. In ranking by this method, the surface area of all the 
bones, as they appeared in the skiagraphs was given a great deal 
of weight, but was by no means the sole factor taken into consideration; 
for the shape of the bones, the number of bones, the particular bones 
themselves, and their area in relation to the total area of the quadri- 
lateral seemed likewise to be matters of fundamental importance. 

In this experiment, the above mentioned factors were not weighed 
as to their relative importance, for the problem was to make a judg- 
ment on the basis of acute impression of all of the factors taken 
together. General impression, then, may be said to be the uncon- 
trolled factor in the experiment. 


An EXAMINATION OF THE RESULTS FOUND FROM INSPECTION OF 
RADIOGRAPHS 


Composite Rankings.—In the beginning, the inspectional study of 
skiagraphic representation as seen in radiographs of the carpal area 
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of the hand, it was decided to make six different rankings, then to 
make a composite ranking of the results of the six, in order that the 
deceptive perceptual characteristics would be overcome. These 
rankings were: (a) the multangulum majus, multangulum minus, 
and naviculare, (>) lunatum and triquetrum, (c) capitatum and hama- 
tum, (d) the epiphyses of the ulna and radius, (e) the epiphysis of the 
ulna, charting the stages of fusion, (f) general observation. 

In Table I, the results of the several methods of ranking are 
listed in terms of correlation with the method of general ranking which 
was the first one used in all of the different groups. The purpose was 
to determine whether there was any advantage in making several 
kinds of rankings, and also to learn which method was the best. These 
correlations show that all of the rankings correlate very highly with 


TaBLE I.—CoRRELATION OF SEVERAL METHODS OF RANKING WITH RANKING BY 
THE METHOD or GENERAL OBSERVATION 





7 7 8 8 9 9 10 | 10 
year | year | year | year | year | year | year | year 
boys | girls | boys | girls | boys | girls | boys | girls 





SO, it ne nee ccs onto .96| .99 | .93 | .99 | .97 | .90; .99| .88 
Multangulum Minus and Navic- 

Miiisst cote s casas des .96, .99 | .99; .98 | .99 .98 | .98) .85 
Lunatum and Triquetrum......| .96) .99 | .86 | .96 | .98 | .95 | .96 | .89 
Capitatum and Hamatum..... .95) .94 | .98 | .92 | .91 | .97 | .94] .81 
Radius and Ulnar Epiphyses...| .95) .88 | .94 | .82 | .74 | .941! .67 | .60 
Ulnar Epiphyses.............. .89| .80 | .84| .78 | .81 | .93 | .81 | .51 
ts: s wh cneanoned as 1.00} .99 | .98 | .99 |} .98 | .99 | .98 | .83 





























General Observation, in nearly all cases above .90 and that general 
observation is considerably lower than the measured ranking, as will 
be shown in Table II. This indicates the general error made in rank- 
ing by inspection. The fact that the composite method of ranking 
correlates almost perfectly with the first method tried out, namely, 
that of general observation, proves undoubtedly that one method of 
inspectional ranking is as good as another, and that it is not necessary 
at all to expand the time required to make all to the rankings. There 
is an indication also that it does not matter how much the radiographs 
are shifted and shuffled when the method of ranking is changed, the 
sensory elements in the radiographs themselves are a constant which 
makes the results about the same. 
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It is also noticed in Table I that the correlation of the ranking by 
the ulnar epiphyses method is not nearly so high as it is by any of the 
other methods. This is explained by the fact that there is a great deal 
more irregularity in the appearance of this epiphysis than in the 
appearance of any other bone. Another reason is found in the many 
different shapes that this bone assumes. The variability in the time 
of appearance and in the shape of the ulnar epiphysis undoubtedly 
contributed to the smaller correlation found by ranking according to 
the epiphysis of both the ulna and the radius. It is evident that the 
inspector is deceived more by the ulna than by another bone or com- 
bination of bones in the carpal area. 

Correlation with the Measured Radiographs.—The direct results of 
the whole inspectional process of ranking radiographs are listed in 
Table II. The rankings of the radiographs are made by measuring 
the area of a quadrilateral constructed on the proximal points of the 
radius, ulna, fifth metacarpal, and the epiphysis proximus of the first 
metacarpal, then finding the sum of the areas of the multangulum 
majus, multangulum minus, naviculare, lunatum, triquetrum, capita- 
tum, hamatum, pisoforme, and the epiphyses of the radius and the 
ulna, and finally determining the ratio of the latter to the quadrilateral. 


TasBuw II].—CoRRELATION OF INSPECTIONAL WITH MEASURED RANKINGS 











7 year 7 year 8 year 8 year 9 year 9 year 10 year 14 year 
boys girls boys girls boys girls girls boys 
.85 91 .82 .73 .75 .62 41 .53 

| 























There is an indication that error in making inspections is less 
in the earlier ages than it is in the later ages. This is easily accounted 
for from the fact that there is a greater unossified area in the carpals of 
the younger ages than in the older ages, which gives the inspector an 
opportunity to notice more carefully the bones that are present. In 
the older ages, there is considerable overlapping of the various bones, 
and all of the areas look very much alike. After securing such a low 
correlation in ranking the radiographs of ten-year-old girls, it was 
decided to select a still older age for inspection in order to see if it were 
possible to do a better job of ranking by the inspectional method. 
There is very little difference in the state of ossification of the wrist- 
joints of ten-year-old girls and fourteen-year-old boys. As is shown in 
Table II the record of correlation is only slightly better than it was in 
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the case of ten-year-old girls. At the beginning of the experiment, 
only seven, eight, and nine-year-old boys and girls were to be ranked. 
After getting low correlation with nine-year-old girls, ten-year-old 
girls were ranked with the result that the correlation was still lower. 
It appears highly plausible that it becomes gradually more difficult to 
rank by inspection as the carpal areas become ossified. 

As was shown in Table I, less correlation with the general observa- 
tion method of ranking was secured by ranking the ulnas of the 
different roentgenograms of the group. It was thought that probably 
the ulnar rankings might correlate more closely with the planimeter 
measurements than the composite rankings. Table III shows the 
results of correlation with the planimeter measurement of the total 
ossified area and also with the planimeter measured epiphysis of the 
ulna. The results shown in the table indicate that there is a much 


TaBLeE II].—TxHe CorRELATION OF THE INSPECTED ULNAR EPIPHYSIS WITH 
THE ToTaL MEASURED CARPAL AREA AND WITH THE MEASURED ULNAR 
EpiIpPHysis 





7 year | 7 year| 8 year| 8 year| 9 year; 9 year| 10 year| 14 year 
boys | girls | boys | girls | boys | girls | girls | boys 


| 








Total Area......... | .79| .88| .69| .39| .17| .66) .49 | .27 
eee. 87} .95| .91); .388| .91| .87] .66 | .68 





| 
| 























higher correlation of ranking the ulnar epiphysis by inspection than 
there is in ranking the group according to the epiphyses and correlating 
with the total measurement. By comparing the ulna rankings correla- 
tions and the composite or general observation rankings correlations 
with the measured rankings, it is seen that there are much wider mar- 
gins of difference in the ulna ranking correlations. Such a comparison 
indicates that probably the ulna epiphysis is the most deceptive of all 
of the carpal bones for the purpose of inspectional ranking. 

Middle Shift Error—In ranking the twenty radiographs, it was 
stated that three strings were stretched one above the other; that in 
the first rough grouping of the films the least developed were placed on 
the top string, the greatest developed on the bottom string, and the 
remaining third on the middle string. It would seem that the cases 
which showed the least and the greatest development would be more 
accurate and there would be a greater tendency to make errors in 
ranking the films on the middle string. Accordingly, the deviations of 
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each rank of each film in the inspectional results from the true place as 
noted in the measured result is calculated for each age and sex to 
bring out whether this is true. These deviations are shown in Table 
IV. The first four radiographs as ranked from the least developed in 


 TasLte IV.—INpDIVIDUAL AND AVERAGE Group DEVIATIONS OF INSPECTIONAL 


FINDINGS FROM THE REAL RANKINGS OF THE VARIOUS AGES AND SEXES 











































































































Number film rank....... finn es Bs ‘ts ie Sod Dr din Oru tae Oe at ts Og an kT te 
ER ee: 1; 1110) 3) 2) 2) 2} 3) 2) 1) 4 1) 6 2) 3] 1) 4; 2; 0 
RS v ao ns a neieeks 0} 4) 2) 2) 2) 5 1) -1) 2) 4) 2) 4) 3} 2) OF 2} 2) il 
8 year boys............. 6; 1; 2; 0} 2) O} 2; 0} 0 O 

Sere 7| 1) 1) 3] 4 3] 9) 3] 4) 4] 7) 5) 4) 2) 3! 5 2) 2) 2) 1 
| ae 0} 4) O} 1) 1) 5) 5 2) O} 8) 5) 4 4) 1) 1) 3) 2) 2) 3 
9 year girls............. O} 2)12/ 1) O} O} 1) 5) 2) 4) 7; 1) 6 3) 2) 3) 5) 3} 4 
10 year girls............ 8}16) 0} O} 1) 5) 1) 7 7) 2) 5} 2) O} 3) 511) 2) 5) 5 
RD, TT ee Ter 3.1 2.9 2.7 3.0 2.7 





each age and sex are placed in the first section, the second four in the 
second section, etc., until five sections are formed. The average 
deviations are then summed and the average for the four found. As 
is shown in Table IV, the average deviation in the first section is the 
largest of any of the averages. One would naturally expect that there 
would be more errors made in the middle section than anywhere else. 
This may be accounted for from the fact that there are twenty or 
fewer roentgenograms in each age and sex ranked, which does not 
furnish a large enough distribution for errors to be grouped at the 
middle. 

It may be noticed from this table also that there are only four devia- 
tions over ten points from the real planimeter-quadrilateral measure. 
The average deviation of all of the films ranked in this experiment is 
only 2.9. It might be contended that out of as large a number as 
twenty radiographs a deviation from the true rank of only two isslight 
and that such accuracy by ranking inspectionally is close enough 
for practical purposes. It is true that any film just 2.9 points from 
the true position in ranks can hardly be distinguished by the naked 
eye when it is placed side by side with the true one. For anatomical 
purposes, the film selected by the inspectional process would serve 
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the purpose of description; but if anatomical development is to be 
refined to the point that the skeletal age of children can be determined 
from it, and consequent classification and grading made, it is obvious 
that the more accurate the measurement, the better it will be for the 
child so classified. 

By assembling the deviations by groups of four radiographs, as 
was done in Table IV, according to age and sex, and finding the average 
of the deviations for each group, it becomes evident that there is no 
more likelihood of a tendency for errors to be grouped in the middle for 
the older ages than there is for the younger. In Table V it is clearly 
shown that there is no tendency for the errors to accumulate at the 
middle of the distribution. 


TaBLE V.—AVERAGE DEVIATIONS FROM THE TRUE RANK BY THE DIFFERENT AGES 





Average 
i 4 SR ban ee 3.7 2.6 2.0 3.0 2.0 2.6 
7 year girls............... 2.0 | 2.3 3.0 1.8 1.3 2.3 
ey 2.3 1.3 1.0 sak hae 1.5 
8 year girls............... 3.0 | 4.7 | 5.0 | 3.5 1.8 3.6 
9 year boys............... ce.) ee tae. wet S5 2.5 
9 year girls............... 3.7 | 25 | 2613.5 | 43! 3.2 
10 year girls.............. 4.6 3.6 | 4.0 4.8 4.0 4.3 
14 year boys.............. 3.8 | 5.0 | 5.38 | 4.5 | 2.3 4.3 























In order to demonstrate the value of practice and to test perceptual 
acuity, the experimenter made a second inspection of each age and sex 
group, except that of the fourteen-year old boys, exactly three weeks 
from the time that the first inspections were made. The results shown 
in Table VI may be significant. 


TaBLE VI.—CoRRELATIONAL OF INSPECTIONAL RANKINGS THREE WEEKS APART 











7 7 8 8 9 9 10 14 
boys | ‘girls, | boys | girls | boys | girls | girls | boys 
95 .97 | .93 .92 .96 .90 .95 























It is seen that the correlation is almost perfect, although the radio- 
graphs were shifted and the inspector had no idea what his previous 
rankings were. It appears that practice plays a very small part in the 
inspectional process. As is evident from the high coefficient found in 
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the re-rankings, the correlation of these new rankings with the planim- 
eter measurements would be just about the same as the rankings 
made at the first. 

It appears also that the same sensory elements that caused errors 
in the first inspections made the errors in the second inspections. 
It is evident that the process of inspection could not be refined to a 
degree where all of the deceptive perceptual elements could be elimi- 


nated, unless a number of objective devices, such as the quadrilateral 
were used. 


CONCLUSIONS 


1. Judging from the introspectional data, the inspectional process 
is dependent upon the perceptual senses almost entirely. There are 
SO many sensory factors included in the data that must be scanned 
that there is very little place for analysis. The result follows that 
the correctness of the judgment in ranking the radiographs is to a very 
considerable degree affected by one’s ability to discriminate sensory 
factors. 

2. The use of any device, such as the quadrilateral imposed upon 
the carpal area, measuring the perpendicular and horizontal distances 
of the quadrilateral with a rule, measuring the distances across the 
various bones with any objective device, aids to a great extent in 
ranking more accurately the radiographs. These devices, however, 
are objective and destroy the real value of inspectional results. 

3. It matters little which combination of bones is taken for the 
purpose of ranking by inspection. It is better, however, to take a 
combination than it is to take any one certain bone like the ulna. 
Probably the better procedure is to make a careful general observation 
and base the rankings upon this, as the results do not seem to be any 
more accurate when several rankings are made. 

4. There appears to be noimprovement of perceptual discrimination 
by practice as observed in separate rankings three weeks apart. The 
same sensory elements are to be interpreted, and acuity of discrimina- 
tion is enhanced very little. 

5. In such a small number as twenty radiographs, there appears to 
be no tendency to make more errors in the middle of the distribution, 
but it is evident that errors of ranking tend to increase with the age of 
the pupil. 

6. From the correlation of the inspectional results with the planim- 
eter measurements of the bones of the carpal area, it is patent that 
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; 
ranking by inspection is not accurate, nor is it relatively so. While it we 
appears that the error is of no great significance, from scientific point af 
BS of view, it is certainly noteworthy. If educational usage is to be bee 4 
i based upon anatomical development, then any arbitrary measurement ; 
" like inspection must be rejected because of its inaccuracy, and a more = 
‘ scientific measuring device used. ¢ 
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A PERCENTAGE EQUIVALENT FOR THE 
COEFFICIENT OF CORRELATION 


P. H. NYGAARD 


North Central High School, Spokane, Washington 


I. THe PRoBLEM 


The amount of relationship indicated by a coefficient of correlation 
is one of the least understood and most abstruse points in the field of 
statistics. That a coefficient of zero between 2 traits means that there 
is no relationship between them, of plus 1 a perfect direct relationship, 
and of minus 1 a perfect inverse relationship, is simple enough. But 
for all intervening values of the coefficient, the interpretation is far 
from simple. A coefficient of .6 does not show twice the amount of 
relationship indicated by one of .3. Nor does an r of .75 signify a 
relationship three-fourths as much in amount as a perfect relationship. 
No simpler explanation has been offered than the following: The 
expression, 1/1 — r,s’, indicates the ratio between the probable error 
when A is predicted from B and the probable error when A is merely 
selected as the average trait A; or, what amounts to the same thing, 
the expression, 1 — +/1 — r,s”, indicates the percentage of reduction 
of the probable error when A is predicted from B from what it is when 
A is simply guessed to be the average trait A. Using either method, 
it is seen that an r of .866 is necessary in order to reduce the probable 
error of prediction to one-half of the probable error of guessing. 

The writer believes that he has discovered a method by which a 
much more direct and understandable interpretation may be made of 
the amount of relationship indicated by a coefficient of correlation. 
Would it not be simple if a coefficient of, say, .65 between mathematics 
marks and intelligence ratings meant that a pupil’s intelligence, or 
the factors that cause it, constituted 65 per cent of the factors making 
for the pupil’s success in mathematics, and that 35 per cent of the 
factors making for his success in mathematics were entirely separate 
and distinct from his intelligence? Of course, it is not true. The 
writer, however, proposes to have used a formula by which may be 
correctly calculated the percentage of mutual dependence of one 
trait upon another. 
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II. SoLuTIoNn 


Let us assume that trait C depends for its value entirely upon 
traits A and B. Traits A and B are uncorrelated—that is, ra, = 0. 
If trait A is measured, then trait B may be a composite of all the other 
traits, independent of A, upon which C depends. 

Assume further that in determining C the weight of A is k and of B 
is h, trait C being made up of k times trait A plus h times trait B plus 
a constant, f. The constant, f, disappears in the deviations from the 
mean. In order to compensate for differences in distributions and 
units of measurement, the measures of A and B should be converted to 


sigma scores in determining C. The following outline shows what 
is meant: 


























Meas- | Meas- Devia- | Devia- — 
ures of | ures of Measures of C tions tions . C " 
A B A B 
Xi kXi/o4 + hY:/on + j Z1 Yi kxi/oa + hys/oz 
X: Y? kX3/oa + hY¥:/on+f| 2 Yy2 kt2/o4 + hys/oe 
X3 Y; kXs/o4 + hX;/on +f Z3 Y3 kas/o4 + hys/op 
Xn Y, KXn/ca + hYn/on +f In Yn kin/oa + hyn/os 
It will first be necessary to show that 
Teac + Teac = 1 (Formula 1) 


This formula is mentioned by Kelley,’ but the proof here given is 
original and is included to show that the formula holds for the specific 
assumptions made above. All r’s used are found by the product- 
moments method, but from the nature of the problem only positive 
values of r are considered. 
From the above outline, 
sala Lax(kx/o, + hy/os) rs 
AC / 322V/ S(kt/on + hy/os)? 
(k/o,) 2a? + (h/og) Dry 7 
SV S22/ (k*/o42) D2? + (2kh/eace) Dy + (h?/op?)*) Dy? 














Since 
Tas = 0, then Uzy = 0. 


1 Kelley, T. L.: “Statistical Method.” P. 173. 
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Therefore, 
- (k/o4) 22? : 
rac N/ Sa*/ (koa) Da? + (h®/os*) Dy? 
Let 
Ly? = pr2?; or p = Ly?/ Tz? = os?/e4’. 
Then 


(k/o4) 22? k/oa 


40S sf Sat/ (H/o) 22% + (pht/os*) 22° Vk#/04? + phi/os* 
Similarly, 











. pat Ly(kx/ou a hy/oz) da 

BO / Sy2V/ B(ka/on + hy/os)? : 
a (k/o4)Zay + (h/os)Zy* salut & 
V Sy?-V (k2/o42) D2? + (2kh/osce) Sxy + (h?/op*) Ey’ 

















Since 
Sry = 0, and Lz? = Ly?/p, 
(h/op) Zy? h/og 








"ae / By? (k*/poa?) By? + (h2/os?) Sy? WV k*/poa? + h/os? 
Hence, 








ruc + ree = = k? / v4” —s h?/on* . 
AC _— k?/o42 + ph?/op? k?/ po? 4 h?/ox? 
Or, 
. a 2 2 2/2 
rac + r*Bc K*/o rw ph [oe 





= Be? + pht/o,* ' B/Jo,? + phtjost 1 

This formula holds irrespective of the forms of distribution of 
traits A and B, and it was specifically provided in the proof that o. 
could differ from oz. A similar proof can be given even if traits A 
and B are not first changed to sigma score—that is, if trait C is simply 
kX +hY +f. Only two assumptions are made, v7z., r4z must equal 
0, and trait C must depend for its value entirely upon traits A and B 
in a fixed ratio. 

As C depends k times upon A to h times upon B, the ratio of 
dependence, or percentage of dependence, of C upon A will bek/(k + h) 
and upon B,h/(k +h). Let dsc denote the percentage of dependence 
of C upon A, or, more generally, the percentage of mutual dependence 
between Cand A. It will now be proved _ 








Pee wae and 
ac = a 
tac + V1 — Pac (Formula 2) 
Bit snbte se 
wr tee + V1 = Pee 
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As before shown, 
Lx(kx/o4 + hy/os) 





























ur V Sa2?V/ D(kz/o4 + hy/ox)? 
Let ; 
Z(kz/o, + hy/os)? = vrx*; orv = Es lea hy/os)* 
Then 
re a (Rloa)2x? + (h/ow)Zay _ koa 
rs VJ 22? V/vdz2? Vv 
Therefore, 7 
k= acta v = Tacdc. 
Also, 
Tac = _ zy (kt/on = hy/os) . 
V Sy? V 3(ka/oa + hy/os)? 
Let 
= 2 
Z(kz/o, + hy/os)? = wy; or w = arta hy/os)? _ 
Then 
ey (k/o4) Zay + (h/ow)Zy? _ h/ow 
Vv zy? V wry? Vw, 
Therefore, 


h= TectxV/ w = Tacdc. 


= ah Tactc _ Tac| , 
dac = k/(k + h) uy Tactc + Tactc Tac + Tac 





But from formula 1, 





Therefore, 
dac = 
Tac + V1 — Thc 
By analogy, 
Tac 
dac = 
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Formula 2 can also be proved by means of multiple correlation and 
the calculation of a regression equation to determine C.CSuch a 


proof would be easier for persons acquainted in this field. 


III. VERIFICATION 


In order to verify the two formulas and incidentally to make clear 
the terms used, an illustration will be given. For the illustration 


below, o4 = ~V/ 14 andog = 1/23.2. Indetermining C, k = 24/14 and 
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h = 3+/23.2, each measure of C being 2+/14-X /o, + 3°/23.2: 
Y/oz +10. The radical values for k and h were used, obviously, 
































in order that the measures of C may become rational. Then 
2/14 
dac = k/(k +h) = = —— 
a © BP 20/14 + 3+/23.2 
and 
3-~/ 23.2 
dac = h/(k +h) = — ——* 
2/14 + 3023.2 
A B C zigisi ap i| asi gi aig*i # 

3 27 67 —7| —3|—23 21; 161! 69 49 9) 529 
6 20 82 —4 0} -—8 0; 32 0 165 O| 64 
8 1l 59 —2; —9|-31 18 62) 279 4' 81) 961 
9 25 103 —1 5} 13) -—5| -—13! 65 1} 25) 169 
10 29 117 0 9} 27 0 0) 243 0} 81} 729 
10 23 99 0 3 9 0 0; 27 0 9 81 
1l 19 89 1; —1} -1) -1l) -1 1 | 1 1 
12 21 97 2 1 7 2 14 7 4 1; 49 
14 20 98 4 0 8 0} 32 0 165 O| 64 
17 15 89 7; —5| —1) -—35) -—7 5 49) 25 1 
4 » tu ee ee Bk eT wv pore ieee: Swen 
10 20 90 0} 280} 696 | 140) 232/2648 
Av AV. AV. Lry S22 Lyz| Lx?| Dy*i Tz? 



































We see first that the condition r4g = 0 is satisfied. 























232 | 


= 8s 
2648 


2vV 14 





joni 280 . 696 

»/140 2648 2648’ 4/232 - 2648 
Formula 1 is verified, because 

ras + Tac - oe te = 1. 
Using formula 2, 
9 fee. 
di as 
: 140 i - 2+/140 + +/2088 
2648 ~ 2648 
2+/ 140 
2/140 + 3+/232 


~ 2n/14 + 30/23.2 
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This is the value for d4c given above. Using formula 2 again, 








3 | 232 ies 
dain 6 ts _ ae. 
34/282 + fi — 9:38 34/232 + 560 
2648 ~~ 2648 


. oi a 3x/23.2 
31/232 + 20/140 20/14 + 3+/23.2 
This is the value for dge given above. 








IV. APPLICATION 


Although formula 1 was included in order to prove formula 2, never- 
theless it may in itself be of considerable use. Toillustrate—ifscholar- 
ship correlates .50 with intelligence, then a composite of all the other 
factors, separate and distinct from intelligence, that produce scholastic 
ability should give the high correlation of »/1 — (.50)?, or .866, with 
scholarship. By using a number of factors in an attempt to predict 
scholastic ability, correlations as high as about .90 have been obtained 
between scholarship and a composite of all the factors used. But, 
even so, the omitted factors are, according to formula 1, important 
enough to yield a correlation of +~/1 — (.9)?, or .44, with scholarship. 

Formula 2 gives, it seems to the writer, a clearer idea of the amount 
of relationship indicated by a correlation coefficient than has hitherto 
been available. An understanding of the probable error concept is 
unnecessary. The relationship is expressed in terms of percentage, 
a terminology with which every one is familiar. A correlation of 
.50 between mathematics and intelligence would be interpreted to 
mean that about 37 per cent of a person’s ability to master mathe- 
matics is due to intelligence, and 63 per cent to other factors; or, stated 
more generally, that 37 per cent of the elements that make for mathe- 
matics scholarship are elements that also produce general intelligence. 
A correlation of .707 would be necessary in order that mathematics 
achievement may depend 50 per cent upon intelligence. 

The work of converting from r to d is not difficult. It can be 
easily done by substituting in the formula, and it can be made still 
easier by the use of a simple table giving values of d corresponding to 
values of rfrom0to1. Such a table follows. 
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TaBLeE Grvina VALUES oF d FOR VALUES OF r FROM 0 TO 1 BY HUNDREDTHS 














es 


, 1.00 



































Hundredths 
0 wet BE TFS te LSS SS 

al | 
0 | 0} .01 | .02| .03 | .04]| .05| .06| .07| .07| .08 
1 09 | .10| .11 | .12 | .12 | .13 | .14 | .15 | .15 | .16 
2 17| .18 | .18} .19 | .20] .20] .21 | .22| .23] .23 
a 3 24| .25| .25| .26| .27]| .27| .28| .28]| .29| .30 
ce 4 .30 | .31 | .32 | .82 | .83] .34] .34| .385 | .35 | .36 
& 5 37 | .87| .38 | .88 | .39] .40| .40] .41} .42 | .42 
6 43 | .44| .44] .45| .45 | .46] .47| .48] .48| .49 
7 50 | .50| .51| .52| .52 | .53 | .54 | .55| .55 | .56 
8 57 | .58| .59| .60| .61] .61| .63 | .64]| .65| .66 
9 .67 | .69 | .70| .72 | .73 | .75 | .77 | .80| .83 | .88 
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A NOTE ON THE STANDARD ERROR OF THE 
SPEARMAN-BROWN FORMULA 


EUGENE SHEN 
Stanford University 


The reliability of an average score on a number of comparable 
forms of a test can be estimated from the reliability of a single form by 
means of the Spearman-Brown formula, sometimes spoken of as the 
Spearman prophecy formula. To avoidsubscripts, it may be written as 

tad ar 
~ 1+(a—Dr 
where F is the estimated reliability of an average of a tests each with a 
reliability of r. A formula for calculating its standard error was 
proposed in a previous note.' Its derivation was very simple, the full 
process being as follows: 





Coeoeonao w oO © 


ad a 

eT ie— iy a 

Taking its derivative, on 
adr 4 ‘i i 

aR = (14 (a—1pr i 

=(dR)? Za*(dr)? i a*c?, a*(1 — r?)? -al 








N ~ N(i+(a—1)r]” “* ~ (i+ @— drt Nil + @— Dri 
Hence 








yr a(1 — r?) 
V/N{l + (a —1)r}? 

The value was frankly an approximation, higher-order differences 
being neglected in the process of differentiating. 

More recently Holzinger and Clayton? have published a longer fi) 
formula, which may be written as a 

ERA. a(1 — 1?) aa 
"8 VNL + (a — Dr + (@— DAL + @— Dr — FP 

In giving this result and correctly noting that the short formula a 
was a mere approximation, they left the unfortunate impression that % 3 
standard errors should properly be derived without ever neglecting ait 
higher-order differences, that the long formula was so derived, and ry . 
that therefore it was the correct one to use. These three points I ae 
have failed to verify. | 


1 The Standard Error of Certain Estimated Coefficents of Correlation. Journal wal 
of Educational Psychology, October, 1924, Vol. XV, pp. 462-465. t 
3 





or 














2 Further Experiments in the Application of Spearman’s Prophecy Formula. 
Journal of Educational Psychology, May, 1925, Vol. XVI, pp. 289-299. 
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Following the example given by Holzinger and Clayton, if N = 
100, r = .1, and a = 11, then R = .55, for which the short formula 
gives a standard error of .272 while the long formula yields .244. 
“The latter, of course, is the more correct result and a difference of 
.03 can hardly be neglected.’”’ The truth of the last statement, not- 
withstanding the phrase ‘‘of course,’ isfar from obvious. In the given 
example, the standard error of ris .099. If now we obtain as samples 
of r values between .078 and .122 in a range from % sigma below to 
26 sigma above .1, the correct r, we shall then derive values of R 
between .4820 and .6045, a range of .1225, which is 46 of .2756, much 
closer to .272 than to .244. 

As can be seen, the difference between the long and the short 
formula increases while the population decreases. Let us change N 
to 25 in order that the “‘more correct’’ formula may manifest its full 
superiority. Let .1 still be the correct r. Its standard error now 
becomes .198, twice as great as before, and the standard error of R 
will be .5445 according to the short formula and .3870 according to 
the long one. If we now consider a range from 4 sigma below to 
16 sigma above the correct r we shall obtain values of r between .078 
and .122 and shall derive values of R between .4820 and .6045, same 
as before. The values of R, as before, will cover a range .1225, which 
is 24 of .55125, much closer to .5445 than to .3870. 

As far as these calculations show, when the short formula slightly 
underestimates the standard error, the long formula errs in the same 
direction, only to a greater degree. Consequently the more the 
results of the long formula differ from those of the short, the more they 
deviate from the truth. Since the distribution of correlation coeffi- 
cients is often far from normal, it is conceivable that there may be 
cases where the long formula will occasionally give better approxima- 
tions. But the examples given are sufficient to call into question the 
very soundness of the derivation of the long formula, to say nothing 
of its usefulness. | 

That the retention of higher-order differences in the derivation 
of standard-error formulas is the exception rather than the rule, any 
statistician can readily recognize. The standard error of the product- 
moment correlation coefficient, : 

l-—r 
Or /N 
is a notable instance. It is this value that is substituted in the 
(long, as well as short) derivation of the standard error of the Spear- 
man-Brown formula. 
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MUSICAL SENSITIVITY OF CHILDREN WHO TEST 
ABOVE 135 IQ (STANFORD-BINET)' 


LETA 8S. HOLLINGWORTH 


Teachers College, Columbia University 


The present report deals with results obtained from applying 
five of the Seashore tests of musical sensitivity to 49 children who 
test above 135 IQ (Stanford-Binet). These children have been pre- 
viously described elsewhere.‘ They were selected from the public 
school population of New York City within prescribed age limits 
on two considerations, regardless of all others: they must test above 
135 IQ (Stanford-Binet), and they must have their parents’ consent 
to enter the special classes, established for purposes of experimental 
education. 

Of all the children found to test above 135 IQ, and invited to join 
our classes, parents’ permission was refused to thirty-one. These 
‘“‘missing’’ children have been sampled for measurements of physique, 
and have been found not to differ in such measurements from the 49 
pupils included in the group here studied. The “missing” also con- 
stitute a random sample of all found, as regards IQ. They have not 
been tested for musical sensitivity, but there is no reason to suppose 
that parents’ consent depended in any way upon degree of musical 
sensitivity in their offspring, since no special plans for music were made 
in connection with the experimental teaching, and no parent men- 
tioned music in giving reasons for refusal to participate. We know of 
no reason why we should not assume that the 49 children here studied 
represent a random sample of the intellectually gifted, as regards musi- 
cal sensitivity. It is to be noted in this connection that 92 per cent 
of our group are Jewish children. 

The age range of our pupils at the time the tests of musical sensi- 
tivity were made,was 8 years 0 months to 11 years 5 months, with a 





1 This report is rendered as part of the work of a joint committee, in charge of 
special opportunity classes for gifted children, at Public School 165, Manhattan. 
The members of this Committee are Mr. Jacob Theobald and Miss Jane Mona- 
han, of Public School 165, Miss M. V. Cobb, Dr. Grace A. Taylor and Dr. L. 8. 
Hollingworth, of Teachers College, Columbia University. The work was carried 
on with the advice of District Superintendent John'E. Wade, and in co-operation 
with the Division of Educational Psychology, of the Institution of Educational 
Research, at Teachers College. The clerical and statistical work of this report 
was supported, and the necessary equipment for testing was purchased, by funds 
granted through the Institute, by the Carnegie Corporation, of New York. 
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median at 10 years 4 months. Sixty per cent of the group had passed 
the tenth, but had not reached the eleventh birthday. They cor- 
responded as a group to the birthday age norms for children in the 
fifth grade of the elementary schools in Iowa, where Seashore estab- 
lished his norms. 

In IQ, these children ranged from 135 to 190, with a median at 
153. Their median intellectual ability was, therefore, at least that 
of the average adult. Our interest centered especially in learning 
whether the scores of these children, in tests of sensory discrimination, 
judgment of time, and the like, involved in musical sensitivity, would 
correspond more closely to birthday age, or to mental age. Seashore’ 
has stated that ‘‘the difference in the norms” (from fifth grade to 
adult) ‘‘ is due chiefly to the difference in capacity for intelligent effort 
in the tests, rather than to age.” 

The tests were given by the present writer, in the regular class- 
rooms of the children, during the morning session of school. The 
five tests were taken in sequence, with brief rest periods between, on 
the same day. The children were very well accustomed to taking 
tests, and were always eager to be tested. They gave very close 
attention and exceptionally well sustained effort, as is characteristic 
of such selected groups, throughout the series. The two classes were 
tested as separate units, there being 25 pupils in one group, and 24 
in the other. The standard equipment for Seashore’s tests was used, 
one block of tests being given in every instance. 

In reporting his norms, Seashore does not invariably give the score 
at the fiftieth percentile. He sometimes gives the fifty-first percentile, 
or the forty-ninth percentile, instead. We deem such approximation 
to the exact middle of the distribution to be sufficiently close for our 
purposes, and have not refined our comparisons to the fraction of a 
per cent which would become involved by interpolating the value of 
the fiftieth percentile. 


€ 
Pitcu DIscRIMINATION 


One “‘block”’ of trials in pitch discrimination gives a record of 100 
judgments for each child. The fifty-first percentile, according to 
Seashore’s norms for fifth grade pupils, is 67 per cent of correct 
judgments. 

Table I gives the distribution of our intellectually gifted pupils. 
Table II shows the distribution of their deviations from Seashore’s 
fifty-first percentile, and Fig. 1 shows this distribution graphically. 
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iSS- Fig. 1.—Showing distribution of deviations recorded in Table II, from Seashore’s 
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Taste II.—SuHow1na DistrRisvuTioNn or DEVIATIONS FROM SEASHORE’S Firty- 
FIRST PERCENTILE, IN THE CASE OF CHILDREN DiIsTRIBUTED IN TaBLE I, 
FOR Pitcn DiscrRIMINATION 








Amount of deviation Frequency 
+20-15 2 
+15-10 8 

10- 5 6 
5- 5 10 

— 5-10 7 
10-15 4 
15-20 2 
20-25 6 
25-30 3 
30-35 1 
49 








JUDGMENTS OF INTENSITY 


One “‘block”’ of Seashore’s tests gives 100 judgments of intensity 
for each child. The fifty-first percentile, according to Seashore’s 
norms for Grade V, is 74 per cent of correctness. 

Table III gives the distribution of the intellectually gifted. Table 
IV shows the distribution of their deviations from Seashore’s fifty-first 
percentile, and Fig. 2 gives these facts graphically. 


Taste III.—Sxowine Distrisvution or Per Cent oF CORRECTNESS, IN JUDG- 
MENTS OF INTENSITY, BY 49 CHILDREN OF GRADE V AGeE, TESTING 
ABOVE 135 IQ 








Per cent correct Frequency 
90-85 4 
85-80 4 
80-75 12 
75-70 9 
70-65 12 
65-60 5 
60-55 2 
55-50 0 
50-45 0 
45-40 1 
PABA 49 
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TaBLeE IV.—SxHowinGa DistrRisuTion or DEVIATIONS FROM SEASHORE’S Firty- 
FIRST PERCENTILE, IN Case OF CHILDREN DISTRIBUTED IN TaBLeE III, 
FOR JUDGMENT or INTENSITY 
























































Amount of deviation | Frequency 
ok +20-15 2 
15-10 2 
10—- 5 5 
5- 5 26 
— 5-10 6 
10-15 6 
15-20 1 
20-25 0 
25-30 0 
30-35 1 
49 
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In judgments of intensity, 44.9 per cent of the gifted exceed Sea- 
shore’s fifty-first percentile. This group does not quite equal, there- 
fore, the unselected Grade V pupils of Iowa in this respect. Their 
deficit is, however, so small that it might disappear with an increase 
in numbers. 

JUDGMENTS OF TIME 


One “‘block”’ of trials in judgments of time gives 100 attempts for 
each child. The forty-ninth percentile, according to Seashore’s norms 
for Grade V pupils, is 64 per cent of correctness. 

Table V gives the distribution of the intellectually gifted. Table 
VI shows the distribution of their deviations from Seashore’s forty- 
ninth percentile, and Fig. 3 shows the same facts graphically. 

TaBLE V.—SHOWING DisTRIBUTION OF PER CENT oF CORRECTNESS, IN JUDG- 


MENTS OF TIME, BY 49 CHILDREN, OF GRADE V AGE, TESTING ABOVE 
135 IQ 





Per cent correct Frequency 





90-85 
85-80 
80-75 
75-70 
70-65 
65-60 
60-55 
55-50 
50-45 
45-40 
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TaBLE VI.—SHowING DisTRIBUTION OF DEVIATION FROM SEASHORE’S Forty- 
NINTH PERCENTILE, IN THE CASE OF CHILDREN DISTRIBUTED IN TABLE V, 
FOR JUDGMENT OF TIME 








Amount of deviation Frequency 

+25-20 2 
20-15 1 
15-10 2 
10- 5 15 
5- 5 20 

— 5-10 4 
10-15 3 
15-20 1 
20-25 1 
Dba s 0k at binecedund 49 
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In judgments of extent of time, 65.3 per cent of the gifted exceed 
the’forty-ninth percentile, of Seashore’s unselected pupils, as against 
the 51 per cent to be expected if there were no difference in favor 
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Fic. 3.—Showing distribution of deviations recorded in Table VI, from Seashore’s 
forty-ninth percentile for Grade V pupils. 


JUDGMENTS OF CONSONANCE 


A 

One “‘block”’ of trials in judgments of consonance gives 50 attempts : 
for each child. The fiftieth percentile, according to Seashore’s norms ait 
for Grade V is 61 per cent of correctness. ko ie 


Table VII gives the distribution of the intellectually gifted. Table jae 
VIII shows the distribution of their deviations from Seashore’s fiftieth + ‘ 
percentile, and Fig. 4 shows these facts graphically. 
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Taste VII.—Sxowrne DistrisuTion oF Per Cent oF CORRECTNESS, IN JUDG- 
MENTS OF CONSONANCE, BY 49 CHILDREN oF GRADE V AGe, TESTING 
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Fie. 4.—Showing distribution of deviations recorded in Table VIII, from Seashore’s 
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fiftieth percentile for Grade V pupils. 
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Taste VIII.—Suowine DistrisuTion or DeEvVIATIONS FROM SEASHORE’S Fir- 


TIETH PERCENTILE, IN THE CaSE OF CHILDREN DisTRIBUTED IN TABLE 


: VII, ror JupGMENTs oF CONSONANCE 





Amount of deviation 


Frequency 





+15-10 
10- 5 
5- 5 
— 5-10 
10-15 
15-20 
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. In correctness of judgment of consonance, but 44.9 per cent of 
these intellectually gifted children exceed Seashore’s fiftieth percentile. 


TonaL Memory 


One “block” of trials of tonal memory gives 50 attempts for each 
child. Seashore’s fiftieth percentile shows 50 per cent of correct 


judgments. 


Table IX gives the distribution of the intellectually gifted, in regard 
to tonal memory. Table X gives the distribution of their devia- 
tions from Seashore’s fiftieth percentile, and Fig. 5 shows these facts 


graphically. 


TaBLe IX.—Suow1na DistrisvTion or Per Cent or CorreEcTNgEss, IN TONAL 
Memory, IN THE Case or 49 CHILDREN, oF Grape V AGE, TEsTING 








ABOVE 135 IQ 
Per cent correct Frequency Per cent correct Frequency 

90-85 1 50-45 8 

85-80 1 45-40 4 

80-75 2 40-35 5 

75-70 2 35-30 1 

70-65 8 30-25 I 

65-60 0 25-20 1 

60-55 6 20-15 1 

55-50 6 15-10 1 

' 10— 5 1 
8 — 
49 
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TABLE X.—SHOWING DisTRIBUTIONS OF DEVIATIONS FROM SEASHORE’S FIFTIETH 
PERCENTILE, IN THE CASE OF CHILDREN DISTRIBUTED IN TaBLE IX, 
FoR TONAL MEemMory 










































































Skid Amount of 
Amount of deviation Frequency Aanlaitiem Frequency 
+40-35 1 — 5-10 4 
35-30 1 10-15 5 
30-25 2 15-20 1 
25-20 2 20-25 1 
20-15 8 25-30 1 
15-10 0 30-35 1 
10—- 5 6 35-40 1 
5- 5 14 40-45 1 
, | ere 49 
gte——_—3 -25 15 -5 1S +15 +25 +35 +7. 
135 19 fron Seashore'a Fifeleth | Newry 
Percentile 
10 (0 
5 5 
0 Deviations from norazal ntage of correctness 
3 2 15 -5 +5 +15 +25 +35 a 


Fic. 5.—Showing distribution of deviations recorded in Table X, from Seashore’s 
fiftieth percentile for Grade V pupils. 


Fifty-three per cent of the intellectually gifted exceed the fiftieth 
percentile of Seashore’s subjects in tonal memory. Considering the 
smallness of our group this amount of difference is very probably a 
matter of chance. 


' 


SuMMARY OF GROUP STATUS 


It is evident from data thus far presented that these intellectually 
superior children are not superior to average children of their age in 
musical sensitivity, as measured by these tests. In order to obtain a 
more summary statement of group performance, we have computed 
the mean! for each variable separately, as regards percentile status. 





1In computing to obtain means and quartiles for the group, all those falling 
below the lowest percentile listed by Seashore were arbitrarily and uniformly 
given a percentile ranking midway between zero and the lowest percentile listed. 
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These means appear in Table XI. The figures in parentheses denote 
the mean (with fraction dropped) on which the AD is based. 


TaBLE XI.—SnHow1na MEAN PERCENTILE Status oF 49 INTELLECTUALLY GIFTED 
CHILDREN, IN Eacu Test or Musicau SENsITIVITY, PERCENTILES BEING 
BaseED ON GRADE V Norms (SEASHORE) 





Mean percentile 
of intellectually AD 








gifted | 
aia e'so4 5 8'Es 40 uh ba cae ae nen eee 46.7 (47) +26.5 
= 8 eee Cee ee eee 50.0 (50) +19.9 
» PRIS AGE fee Nol 8 9 ln S S  ) | 58.0 (58) +23.1 
SS ots ee US UTI eae 47.9 (48) +25.2 
NS bi visite is bath dated ou 0 + oh sanen hy | 52.3 (52) +22.6 








All of these means, except possibly that for time, approach 50 so 
closely that we must assume our pupils to be distributed as ordinary 
Grade V children are, in the sensitivities tested. The fact that the 
mean percentile for the intellectually gifted rises to 58 in the case of 
time, taken together with the fact that 65 per cent of them exceed 
Seashore’s forty-ninth percentile on ‘‘raw”’ score, suggests that this 
performance is to some extent intellectual (that it is correlated with 
intelligence, above the level of intellect required for comprehending 
and obeying the directions for the test). 

As for pitch discrimination, perception of intensity, perception of 
consonance and tonal memory, they seem not to be correlated with 
intellect (when all members of the tested group are above the minimum 
of intelligence required to carry out the directions). They appear as 
variables independent of intellect. 

This fact is further shown within our own group, when its various 
quartiles are compared with each other, as in Table XII. Here the 
median IQ and median birthday age have been computed for each 
quartile of our group (the Q’s being based on final rank for musical 
sensitivity). 

Within the IQ limits of this highly restricted group, there is no 
difference between the first and the third quartiles, as regards IQ. 
The fourth quartile drops below the other three, but this is undoubt- 
edly a mere chance variation, due to the smallness of the sample repre- 
sented by a quartile. There are five children of IQ above 150 in this 
lowest quartile. 
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Taste XIJ.—SHowine How THE QUARTILES oF a Group oF 49 INTELLECTUAL 
CHILDREN CoMPARE, AS ReGarps IQ anp Brrrupay AGE 











: a ' Median age 
Musical sensitivity based on final rank in all Median IQ| (birthday) 
five tests together 

years | mos. 
Rh i ad) Ls adie come aw eae Was 156.5 9 11 
RR, ON hth a Cha ah Ls oe wa ciy ae ie ig he gp ht’ 153 .0 10 6 
Rs hi las dy oss Oss SER MDEN SO Ke ods 156.5 10 8 
Ee EY tee eee PLS Le enpe es 142.5 10 2 














In order to find “‘final rank” in musical sensitivity for each child, 
we took the child’s percentile status, according to Seashore’s percentiles, 
in each variable. We then found (1) the child’s mean status in the 
five tests; (2) his median status in the five tests; (3) his rank within his 
group according to his mean; (4) his rank within his group according 
to his median. As there appeared slight discrepancies in the two 
rankings thus accomplished, a ‘final rank”’ was determined by striking 
an average between the two rankings. It is this “final rank’’ on which 
the quartiles in Table XII are based. 


REMARKS ON INDIVIDUAL CASES 


As a result of ranking, one child appears as decidedly the best of 
the 49, in musical sensitivity. This is a boy, 9 years 7 months old, of 
1Q 172. His mean percentile status on the five tests is 80; his median 
status, 88. This child studies music, and plays the piano. His 
teacher of music, with whom he has studied for about 12 months, 
made the following independent judgment of his musical abilities: 








Before training After training 
Sense of pitch............... ....| Excellent Excellent 
PIL, Sh olv'cd, Suis ee wid ie ode Excellent Excellent 
Singing interval................. Excellent Excellent 
No viccie nie dn dalca chad Good Good 
Quality of voice................. Good musical Good musical 
Sense of rhythm................. Excellent Excellent 
Control of rhythm............... Fair Good 
ORES TEAS RD ner eae Fair Good 
Emotional reaction.............. Much interested | More and more responsive 
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“Interest in music is most pronounced. He seems perfectly 
fascinated, but is impatient to finish and wants new pieces all the time. 
His progress in piano playing is remarkable.” 

Of the two boys who attained second rank, one has not studied 
music, and the other has studied with the same teacher, who instructs 
the boy ranked first by test. This teacher assigned ratings to this 
second child of ‘‘excellent,” in all traits listed above. She regards him 
as very musical, with power of absolute pitch. In the tests, his 
percentile rating for pitch is 87, and for tonal memory 98. 

None of our intellectually gifted children is shown by the Seashore 
tests to be extremely gifted in musical sensitivity. These children all 
fall into the hundredth percentile of their age group for intellectual 
ability, but not one of them achieves even the ninetieth percentile as 
a mean or median performance in musical sensitivity. 

The lowest scores in our group were made by a girl, 10 years 0 
months old, with an IQ of 151. Both her mean and median scores fell 
far below the lowest percentile rated by Seashore. Several other 
children were nearly as poor as she in their performance. 


CORRELATION AMONG THE ELEMENTS OF MusIcAL SENSITIVITY 


As a matter of incidental interest, correlations were made among 
the various phases of sensitivity tested. These correlations are based 
upon the percentile ratings of the children in our group, using Sea- 
shore’s distribution. The coefficients thus resulting appear in Table 
XIII. They show that in this group, the variables involved in musical 
sensitivity are independent of each other to a marked degree. A 
child may stand high in some of them and low in others. There is 
tendency to positive correlation, slight in all cases, in seven of the 10 
combinations which result from the five elements tested. 
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TaBLeE XIII.—SHowi1na COEFFICIENTS OF CORRELATION BETWEEN ELEMENTS 
or Musicau Sensitivity, In A Group or 49 CHILDREN WHO TEST ABOVE 
135 IQ. THE CORRELATIONS ARE BETWEEN PERCENTILE RATINGS IN 
THE SEVERAL TrRaITs, BASED ON SEASHORE’S NoRMS FOR GRADE 
V Pouprits 





ies Pitch Intensity Time Consonance Tonal 
memory 





RS Ee & x p = .14 p = .49 p = .33 p = .44 
r= 15 r = .51 r= 34 r= 46 
PE, = +.09] ‘PE, = +.07| PE, = +.09| "PE, = +.08 





Intensity........... p = .08 p= .09 p = —.09 
r= .08 r= .09 r= —.09 
PE, = +.10| PE, = +.10] PE, = +.10 





, OE ae p= 45 p = .24 
r= 47 r= .25 
PE, = +.08| PE, = +.09 


Consonance......... " p= .33 
r= 34 
PE, = +.09 


Tonal memory...... , 


CONCLUSIONS 





























As a result of testing for musical sensitivity in a group of children 
of Grade V age (median 10 years 4 months), all standing above 135 IQ 
(Stanford-Binet), we offer the following conclusions: 

1. Above the level of intelligence required to understand and 
execute the directions for taking the Seashore tests (mental age of 
about 10 years), performance in pitch discrimination, perception of 
intensity, perception of consonance, and tonal memory is not symptom- 
atic of intellectual endowment. Children testing in the highest 1 
per cent for intellect, and achieving in school work generally according 
to expectation therefrom, distribute as random selections of Grade V 
children of their age do, in the sensitivities mentioned. Though they 
meet tests of intelligence as well as or better than average adults do, 
they meet tests of musical sensitivity only as well as average 10-year- 
olds can. 

2. It is suggested by our findings that judgment of short intervals 
of time is correlated, though not closely, with intellect; that it is not 
an altogether independent variable. No doubt it is less a function 
of special anatomical structures outside the cortex, than are the other 
forms of discrimination included in the Seashore tests. 
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3. Performance in tests of musical sensitivity is closely connected 
with birthday age, within a group of the intellectually gifted, and is 
not connected with mental age (except possibly in the case of judging 
time). Such performance is even less helpful in identifying high 
degrees of intellectual endowment, than are measurements of physical 
size. As regards the latter the chances favor finding superior intellect 
in a tall, heavy child; but they neither favor nor disfavor the discovery 
of superior intellect in a child who is very sensitive to pitch, intensity 
and consonance, or who is retentive of tones heard. 

4. Since the intellectually gifted children are, as a group, larger than 
unselected children of the same age,‘ it might be expected that they 
would excel in tests such as were given here, solely on the basis of a 
more advanced development of the special anatomical structures 
involved. Such is, however, not the case. Perhaps this means that 
the large children of our group are not merely accelerated in develop- 
ment, as some have surmised, but that they are the large members of 
their species—a distinction with a decided difference. 
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THE STATUS OF UNIVERSITY INTELLIGENCE 
TESTS IN 1923-1924 


HERBERT A. TOOPS 


Ohio State University 


(Continued from January Issue) 


Question 11.—With reference to the first recitation day or opening 
day of school of each semester, when are the tests given? 


About. ..... days before opening of school. 
About. ..... days after opening of school. 


The results show that by the end of the second week before school 
opens 9.2 per cent of the 66 colleges have given their tests; an addi- 
tional 16.6 per cent give the tests during the week preceding the opening 
of school. Thus, in all 25.8 per cent of all the colleges have completed 
their testing before the opening day of school. Then, 25.0 per cent 
give the test within one week after the opening of school; so that 
almost exactly half of the tests have been given before or during the 
first week of school. An additional 20.9 per cent give the tests in 
the second week after school opens and the remaining 28.3 per cent 
are apparently given whenever convenient during the remainder of 
the semester. It is evident, as previously pointed out in connection 
with the discussion on sectioning, that the final test results are received 
too late for purposes of sectioning newly entering freshmen in the case 
of at least three-fourths or perhaps even 90 per cent of the colleges 
giving tests. The obvious remedies are earlier testing—previous 
to entrance insofar as possible—and the employment in the scoring 
of a large enough clerical force to complete the scoring and compilation 
of results in a very few days. 

Question 12 is here omitted because it was evidently misunderstood 
or confused with question 13 following as only two-thirds of the 
66 colleges filled it out. 

Question 13.—How many of each of the following groups were 
tested during the academic year 1922-23 (excluding summer school 
students if tested). 














(a) Applicants for freshman standing. 

(b) Applicants for advanced standing. 

(c) Enrolled freshmen. 

(d) Enrolled upper classmen (excluding graduates). 
(e) Graduates. 
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This question was evidently badly worded and so was evidently 
misunderstood as 22 of the 66 colleges failed to answer it. Of the 44 
colleges which answered the question, there are 9 which test over 
1000 students annually; a second group of 9 colleges test over 500 but 
less than a 1000 students annually. The author wishes merely to 
point out the reliability of intercorrelation coefficients of tests which 
will become available whenever the magazines shall contain inter- 
correlations based on 1000 to 2000 or more students. Ohio State 
University for instance (not included in this report) in the autumn 
quarter of 1924-25 alone, tested 1150 freshmen on Form A and a 
second 1150 on Form B,; it will test approximately an additional 1000 
freshmen in the three remaining quarters of the year. It is indeed 
time that mechanical tabulating machines be brought to bear on the 
analysis of the information contained in the 20,000 intelligence test 
records now in the vaults of some of the colleges of our survey. Not 
a few of these colleges, as shown by Table I, have had a comprehensive 
testing program in force for as long as six years. Many of the subjects 
first tested have now been at work out in the world for from two to 
five years. It is to be hoped that tabulating machinery, trained 
statistical and research personnel and research funds will be available 
for making at least a few comprehensive follow-up investigations of 
the success of former students of different intelligence levels, and for 
exhausting the possibilities of analyzing the factors making for poor 
and good scholarship in the university. Universities have scarcely 
begun the type of research on college educational methods which goes 
by the name of controlled pedagogical experimentation. What a 
variety of controlled pedagogical experiments could be conducted by 
a number of colleges using the same tests! How accurately material 
could be scaled as to difficulty when given to as many as 1000 or more 
entering freshmen, not to mention upperclassmen and graduates! 

Question 14.—Are students given the results of their tests? 





Answer “Yes” or “No.” 
Question 16.—Do you favor giving students the results of the tests? 





Answer “‘ Yes”’ or “‘ No.” 

Of 66 colleges, 48 (or 72.7 per cent) state that the results are given 
out to students. Of these 48, there are 39 which are in favor of giving 
out the results; 7 are in doubt as to whether they should be given out 
or not; while 2 which do give out the results state that they are not in 
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favor of giving them out. (It is to be remembered that question 16 
has to do with the personal opinion of the correspondent to the ques- 
tionnaire. ) 

There are thus left only 18 (or 27.3 per cent) of the 66 colleges 
which do not give out the results to students. Of these 18, there are 
6 which are in favor of giving out the results, and 12 not in favor of 
giving them out (of these it is obvious from the reply that at least 2 of 
the largest colleges are not in favor of so doing because of the vast 
labor and expense involved in giving them out by any scheme other 
than by a distribution-by-mail method). 

Altogether there are 45 in favor of giving out the results, 14 defi- 
nitely opposed, and 7 in doubt as to the wisdom of giving out test 
results to students. 

Question 15.—If “‘ Yes,” under what restrictions (if any)? 

Question 17.—If ‘‘ Yes,’”’ what procedure or restrictions do you 
suggest? 

The precautions taken in giving out test results to students vary 
widely. One college gives them out at chapel in sealed envelopes 
just the same as ordinary “flunk” notices; at the other extreme we 
find colleges giving out the results only after graduation or only 
through the student’s special advisor after careful “preliminary prep- 
aration.’’ Several of the colleges have printed booklets explaining in 
some detail the significance of the scores, and the fallibility of test 
results in order thus to lighten the burden of interviewing many 
students. Quite a few—too many in fact—hope to escape possible 
injustice to the student of low test score on a short-timed, unreliable or 
non-valid test by telling him only the fifth, quartile or tertile of the 
class in which his score falls. Categorical scores of satisfactory- 
unsatisfactory and above-average—below-average are found. Other 
precautions are: 

1. The scores are given out only in the case of high scoring students. 

2. Given out only when students come for interview individually. 

3. Given out only under precautions that the student gets only his 
own score. 

4. Given out upon formal application only. 

5. Given out to student when requested by Dean or other admin- 
istrative officer. 

6. Given out only after one year in college. 

7. Given out only under promise that the result will be treated as 
confidential. : 


a — | a’. 








ad 


a 


Status of University Intelligence Tests 113 


8. Given out only at the discretion of the Dean. 
9. Given out only upon satisfactory establishment of identification. 

10. Given out only after graduation. 

Evidently all these precautions, wise or absurd as they may be in 
any particular case are in large part the outgrowth of two conditions: 

1. The common belief that intelligence tests measure something 
innate and unimprovable and that consequently discovery of his 
dullness (granting he hasn’t already guessed it) by a dull student will 
lead to hopeless despair (a fate presumably worse in its moral effects 
than ow results from the time-honored and respectable habit of going 
home to stay at Thanksgiving or Christmas time). 

2. The evidently burdensome task which results from any method 
of attempting to tell the student personally the significance of his 
scores. Compare the job with that of the registrar and the gymna- 
sium director of a large university should both of these attempt individ- 
ually to point out to the student the significance and the causes of his 
scores on his academic examinations and his physical examinations 
respectively ! 

It is probably significant that these colleges which do not give out 
results have tests which are some .04 lower on the average in validity 
than those which do giye out the results. 

It is perfectly obvious that with a hypothetical test which has per- 
fect validity in predicting a student’s persistence, a college could not 
long remain in doubt as to the advisability of giving out the scores. 

In one college the names of the top 10 per cent of scores on the 
tests are published; in another the names of those in the top quartile. 

In addition to pointing out the general conviction of colleges that 
the test score should be made available to students, this inquiry 
preeminently points out how little we know of human incentives and 
human reactions thereto. Cannot some one experimentally discover. 
what is likely to be the effect on a 5-percentile persun of telling him he is 
a 5-percentile person; also of telling him he is a 95-percentile person. 
Also the effect of telling a 95-percentile person he is a 5-percentile person 
If the present widespread concern is over the possibility that we will 
tell some 95-percentile person that he isa 5-percentile person or vice 
versa then it is high time that our tests were made more reliable and 
particularly more valid than at present. High reliability of test scores 
can be readily gained by the simple expedient of lengthening our tests. 


One of the most sensible answers to question 17 was: Make it an infor- 
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mal and helpful matter.’’ It is pertinent to inquire whether the 
present methods fulfill these two requirements. 

Question 18.—What penalty is inflicted for absence from the intelli- 
gence examinations without adequate excuse? 

Of 66 colleges, 27 do not have this problem and so have no penalty. 
Six colleges, do not answer, and 11 require merely a later test. In 7 
more, students are not admitted to class work until the test is taken. 
Six have fines, three of these being five dollars each as a “fee for the 
privilege of later examination.” In five, absences are counted as 
‘“‘class cuts.” In two colleges, class credit is withheld until the 
examination is taken. The two remaining colleges, either place the 
student on probation or summon him to the Dean’s office for discipline. 

The problem is not met by colleges which give the tests before school 
opens, as a certificate from the examiner certifying that the test is 
completed is a sine qua non to completion of registration. It should be 
noted that the physical education departments had this problem in 
the early days of physical examination and that it also had the similar 
problem of deciding to what extent to intrust the student with the 
results of his physical examination. With merely the passage of 
time the custom of giving physical examinations became a tradition 
and most of such difficulties disappeared. And no physical examiner 
nowadays hesitates to discuss with a student the significance of his 
physical examination. 

Question 19.—What educational officer is in charge of the intelli- 
gence files and records, or where are they kept? 

Table V shows the number of colleges which deposit one copy of 
student ratings with each of the administrative officers named. The 
66 colleges officially make use of 102 copies of the records. 

In a large university of 10,000 to 20,000 students the problem of 
duplicating the test records so as to make them available for the 
Deans of all the colleges and all administrative and teaching officers 
who can profit by them is a serious problem. To be of most use, the 
record should partake of the nature of a card file kept up te date. 
The problem is thus one of how in practice, to multiply many times, 
quickly, accurately, and cheaply a “‘live file” of as many as 20,000 
students. This requires also cooperation with the registrar’s office. 

Question 20.—Please state here any definitely formulated rules 
(or statistical equations) which you apply in any of the above cases 
(e.g., Students in College X who makes less than an intelligence score 
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TaBLE V.—DIsTRIBUTION BY NUMBER OF COLLEGES AND BY UNIVERSITY ADMINIS- 
TRATORS OF Copies OF UNIVERSITY INTELLIGENCE Test Recorps Usep 











Administrator or office Number of colleges 
Psychology Department..................00200eeee cece 31 
iin hte 6 o¢ Rabi wid <9 56h's widen’ one nmmtedat oaweke 15 
i a ae on wo els ape das 15 
aii aE os o's sy a alhaws 40 UAee be 4A Sed eee 8 
ee ea cee heba eva soheteneehnae ee 7 
Beene Erempareenens. wk 6 
I Wi hg dis SS Vaw belbd wus 6 6 chim vie Sidbie VELL 4 
a ites cet eet NS ain atin oi 4:40 sean ake «bie pte 4 
Personnel or Research Bureau....................-.0005. 3 
le a RE a A  O 3 
I cn che dene eb hus acs Shaaaasacecehs 3 
SY EEE Skee see. Ol ce CO A. 1 
eS SI a ois obs oisb delete ale ins SUR Abies ER 1 
EE oite cece ch apamitiai eh mall ices «ate ave fen 1 
Total copies of records given out by 66 colleges...... 102 





of percentile Y, will not be allowed to take more than 12 hours work 
per semester). 

Four of the 66 colleges have definitely formulated rules regarding 
the amount of work which can be taken. One college has a maximum 
number of hours for low students and a minimum allowable number of 
hours of work for bright students. Three of the four colleges appar- 
ently seem to regard taking extra hours as a sort of privilege which 
may be dispensed to the “‘deserving”’ student under rigorous condi- 
tions! This seems particularly true of the college which allows a 
student to take extra hours only in case he has high intelligence and 
high scholarship, i.e., high ability and high AQ so to speak. Adminis- 
tratively this policy is ‘‘safe;’”’ the student enrolling for extra hours 
under this condition is very likely to pass all his work with credit. 
Such a rule utterly disregards the pedagogical finding in undergraduate 
education that oftentimes an increased load or work is all that is 
required to get a bright but lazy and indifferent student to put forth 
effort consistent with his ability. 

Question 21—How many alternative forms of your test or tests 
do you have available? 

Aside from Army Alpha and the Thorndike Test it appears that 
there were in 1923-24 no tests in use for college testing in which 
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three or more forms were prepared at one time and were available for 
prevention of coaching, providing means of greatly lengthening the 
time limits and amount of content while keeping difficulty constant, 
measuring the effects of coaching, provision of adequate test controls 
in practice experiments and the many other uses to which a large 
number of alternative forms of tests can be put. Several colleges 
were in the habit of preparing annually a new form of test; doubtless 
at great expenditure of labor and sacrifice of research time and with 
a minimum of satisfactoriness of results. The great majority of 
the colleges have only one or at best two forms of the test which they 
use. A certain amount of cooperation in the derivation and use of a 
common test series prepared in many forms would surely be to the 
advantage of many of the colleges using tests. 

Question 22.—What correlation coefficient is most characteristic 
of the statistical reliability (self-correlation) of alternative forms of 
your intelligence test or unselected college freshmen? 

The tabulation of results shows 2 colleges which have reliability 
coefficients of from .75 to .79; 3 colleges, .80 to .84; 8 colleges, .85 to 
.89; 4 colleges, .90 to .94, and 1 college .95 to .99. The average reliabi- 
lity of these 18 colleges’ tests is .87, which is rather good. One cannot 
help wondering why 48 colleges do not know the reliability of their 
tests. From one point of view low reliability means the possibility 
of improvement by the simple expedient of lengthening the test. 
The improvement in validity, due to increased length of test is always 
much less marked than the increase in reliability, the improvement 
being greater for tests of low initial reliability and vice versa. The 
validity coefficient is also increased by lengthening the test. For 
reliability coefficients of !81 and higher this means an 11 per cent or less 
of maximal possible improvement in the validity coefficient. Evi- 
dently then, the difficulty of the test and the kind of content are much 
to be considered as well as the time limits of the test. 

Question 24.—What correlation coefficient is most characteristic of 
the relationship of freshmen first semester total school marks in your 
college or university to freshmen second semester total school marks? 

The reliability of first semester with second semester college marks, 
varies greatly. Table VI is interesting in this connection. 

Although the average reliability of .66 is somewhat higher than 
some have led us to expect, it is obvious that not a few colleges must 
remain content with fairly low validity coefficients so long as their 
college marks remain so notoriously unreliable. And what, we won- 
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TaBLeE VI.—Tue RELIABILITY COEFFICIENTS OF First SEMESTER WITH SECOND 
SeMEesTeER CoLLeGE Marks IN 17 COLLEGES 





Reliability coefficient of college marks Number of colleges 





.80- .84 
.75-.79 
.70-.74 
.65- .69 
.60-.64 
.55-.59 
.50-.54 
.45- .49 
.40-.44 
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der, is the reliability of college marks in the other 49 colleges? The 
answer to this question alone might practically answer the commonly 
asked question. ‘‘What is the next step in testing?” by largely 
redirecting our efforts towards the improvement of college marks 
which, if as unreliable as at present, we must continue in many colleges 
to predict but poorly by the best tests we can construct. 

Question 25.—As a result of your experiences with these tests what 
constructive suggestions have you to make with respect to the methods 
of administration of such tests, construction of the tests, uses to which 
the tests should be put, the value of such tests, or the most needed 
immediate researches or future lines of probable progress? 

Noteworthy suggestions for improvements in our university intel- 
ligence tests are as follows: 

1. Use tests to check up on high schools which consistently gradu- 
ate unprepared students. 

2. We should develop AQ procedures for social groups of college 
students in order to determine what groups of students are failing to 
respond adequately to their opportunities. 

3. We should adapt our college intelligence tests more to the needs 
of women students. (Mentioned by three women’s colleges.) 

4. Determination of the minimum length of test which will yield 
the maximum reliability. 
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5. Attempt to have all test results available by opening day of 
school. 


6. Better standardization of methods of scoring tests. 

7. Development of tests for assisting in giving advice as to selection 
of college within a university. 

8. Retesting on individual tests students who score low on the 
first test. 

9. Tests given prior to matriculation afford maximum incentive to 
the student to do his best on the tests. 

10. The lower and upper quartiles of students should be given addi- 
tional tests to measure their specialized capacities adequately. 

11. Determine the reliability of college marks throughout the four 
years of college work. 

12. Give to all foreigners, and all who prefer it, an individual 
instead of a group test. 

13. Develop a technique for student mental hygiene advice. 

14. Make up our batteries of tests, from tests each of which will 
stand on its own feet as an entity, and find differential weights for 
each test. 

15. The content of the test should depend somewhat upon the 
vocational character of the college. 

16. Test results should be used for follow-up purposes. 


17. Develop tests with a low practice gain upon taking the test 
a second time. 


Part IJ. Tue Use or INTELLIGENCE TESTS BY COLLEGES OF THE 
STATE OF OHIO 


Questions 1-5 only were sent to 50 Ohio Colleges, of whom 44 
answered the questionnaire. 

Of these 50 colleges, 17 officially used tests of 1923-24 and 2 used 
them unofficially, 19 in all, or 43.2 per cent of those answering the 
questionnaire. Twenty-five did not use them. The names of colleges 
using and not using tests in 1923-1924 are shown in Table VII. 
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TaBLeE VII.—Use anv Non-vsE or INTELLIGENCE TEsTs BY OnI0 COLLEGES 1N 


1923-24 
Part A. Names of Colleges Which Used Tests in 1923-24 


BeGcan USE or 


UNIVERSITY OR COLLEGE TEsts In: 
Ss lie chs oss ow deen saab ed ecodeat 1921 
i. og Oe POSS of Lee wee bee 6 Coen 1921 
i I, og cutee cic wbbeetecseeseuene 1922 
4. Case School of Applied Science*........................ 1919 
re oe 1922 
6. Cleveland School of Education......................... 1920 
i, es ian oo 5b DEEN x! 0 0 10,c:dodiabas Rawk ea thks 1923 
2 4 ASAE AS SAD Ae i trees ee 1923 
Ie ae tr et Bal 9 BAA res (Unofficial) 
ee Nn os ce bbs ood 6c edie weet cee ee es 2a 1919 
i ovo cccak ets pu bnedeteen tc ccetes 1923 
i  ueeeemnneb ent sees 1919 
ae es oe ely ehbeee cele te ced. 1919 
ee eRe ek. ca ceke Geena eee et 
Bi GED WOM TUM? gow nw cc cect cect ci sececes 1920 
ee CIOOER cee cca keeeeesecnsee en 1922 
ls ck ok te ee (Unofficial) 
RR a -S  a 8 ieh  a 1922 
ee ee es ae oa cs cakeiewaba id assaee ee 1919 


In 1924-25, the following colleges are known to have adopted the use of tests 
(these names appear therefore in the next list also): 
Bluffton College 
Capitol University 
Heidelberg University 
Kenyon College 
Marietta College 
Ohio Northern University 
St. Johns University 





* Those 16 colleges which are starred are members of the Ohio College Associa- 
tion. Eleven members of the association in 1923-24 did not use tests; while 2 
members of the association did not reply. Five members of the association 
adopted tests in 1924-25, consequently in 1924-25, there are at least 21 of the 28 
members of the association using tests. 
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Part B. The Names of Colleges Not Using Intelligence Tests in 1923-24 


1. 


11. 
12. 
13. 
14, 
15. 
16. 
17. 
18. 
19. 
. St. Mary’s Theological Seminary 
21. 


24. 
25. 
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of 
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Bonebrake Theological Seminary (Hope to use tests on return of instructor 
from Yale) 


. Bluffton College (adopted tests in 1924-25) 


Capitol University (adopted tests in 1924-25)! 

Cedarville College 

Central Theological Seminary of the Reformed Church in the U. S. 
College and Academy of the Sacred Heart 

Dayton University! 


. Eclectic Medical College 


Findlay College 

Hebrew Union College 

Heidelberg University (adopted tests in 1924-25)! 
Kenyon College (adopted tests in 1924-25)! 
Marietta College (adopted tests in 1924-25)! 
Mount St. Mary’s Seminary of the West 

Mount Union College 

Ohio Northern University (adopted tests in 1924-25) 
Otterbein College! 

Oxford College for Women 

Rio Grande College 


St. John’s University (adopted tests in 1924-25)! 

St. Xavier College! 

Toledo University (used tests in Psychology Department in 1923-24)! 
Western Reserve Wniversity! 

Wilberforce University 


Part C. The Names of Colleges Which Did Not Answer Questionnaire 


Ashland College 
Cincinnati College of Pharmacy 
Cincinnati Veterinary College 


Lane Theological Seminary 
St. Ignatius College 


Dennison University! 


John Carroll University? 


As in the case of the use of tests nationally, the growth of the use 
tests in Ohio has been steady and continuous. In the case of those 


colleges answering the questionnaire (44 colleges, 87 per cent of all 
colleges), 





1 Indicates members of Ohio College Association. 
? Through oversight, no questionnaire sent. 


— -—«- — 


rr 






































or 


e 


ll 


Status of University Intelligence Tests 


In 1919-20 there were 5 colleges using intelligence tests 
In 1920-21 there were 7 colleges using intelligence tests 
In 1921-22 there were 9 colleges using intelligence tests 
In 1922-23 there were 13 colleges using intelligence tests 
In 1923-24 there were 16 colleges using intelligence tests 
In 1924-25 there were 26 or more’ using intelligence tests 


The Army Alpha was given more frequently than any other test. 

The tests were used quite widely. Only 5 colleges report using 
the tests for entrance purposes. The tests were used administratively 
on an average of 4.7 times per college by those colleges which reported 
the uses made of the tests. Eight colleges out of 19 report using the 
tests for purposes of sectioning students according to capacity for 
progress. 


SUMMARY OF THE USEs OF INTELLIGENCE TESTs IN 66 oF 110 AMERICAN 
UNIVERSITIES 


1. The growth of the use of intelligence tests has been steady and 
continuous since the war. Sixty-six out of 110 colleges sent the 
national questionnaire gave tests in 1923-24. 

2. Smaller colleges, with lack of testing personnel do not use 
tests to the same extent as the medium-sized colleges. 

3. The large colleges have administrative difficulties in getting a 
testing program to function efficiently. 

4. The Thorndike test was the most widely used test in 1923-24, 
displacing the earlier lead of the Army Alpha. 

5. The longer tests have the higher validity coefficients. 

6. The median validity coefficient of 43 colleges is .46. 

7. Twenty-three colleges do not know the validity of the tests 
which they use. 

8. No college uses intelligence tests as a sole basis for entrance. 

9. Tests are used for entrance in only 19 cases out of 341 other 
aggregate uses made of tests by 66 colleges. 

10. Tests are thus primarily a pedagogical and administrative 
device, and in comparison are little used for entrance purposes. The 
three most frequent uses in order are: (a) in determining dismissal for 
low scholarship; (b) in encouraging extra effort in the case of unmoti- 
vated bright students; and (c) in determining amount of school work a 
student shall be allowed to carry. 


1Qne did not report date. 
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11. Those colleges which use tests for admission purposes use them 
just as much for other pedagogical purposes as colleges which do not 
use them for admission. 

12. Probably due to scheduling difficulties, sectioning students 
into classes of differing brightness has been slow in getting under way. 

13. More sectioning is done in English than in any two other 
university departments combined. 

14. Entrance upon the basis of intelligence tests invariably takes 
high school scholarship into account, and frequently other factors also. 

15. Only 26 per cent of colleges doing testing have the tests given 
before the opening day of school. The consequent late date of avail- 
ability of test records makes sectioning of entering freshmen an 
impractical matter. 

16. A number of colleges give over 1000 tests annually, while some 
run up into several thousand. With the introduction of mechanical 
machines for tabulation of the results, many extensive researches now 
impossible can be readily made. 

17. Seventy-three per cent of the colleges using tests give out to 
students the results of their tests. 

18. The difficulties involved in giving out scores and interpreting 
them to students are being satisfactorily met by a majority of colleges. 

19. Every second college has a demand for at least two sets of the 
psychological test records for the use of different administrative 
officers; President, Dean, student advisors, Registrar, appointment 
office, etc. The problem of duplication of test files is an insistent one, 
especially in the larger colleges where this difficulty holds back the 
introduction and the adequate functioning of tests. 

20. Most colleges are relying on commercialized tests, seldom 
available in more than two forms, too easy and not specifically con- 
structed for college testing. 

21. The results, both as to validity and reliability, achieved by 
certain colleges point out the desirability of general adoption of tests 
of at least two hours in length. 

22. College marks have a higher reliability than might be expected, 
an average reliability of .66 between first and second semester marks. 
Nevertheless, their improvement is without a doubt, one of the next 
most important steps in raising the validity of predictions made from 
college intelligence tests. 

Much research can evidently be profitably spent at the present 
time on the improvement of college marks. We need toknow how they 
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vary as dependent upon courses, instructors, number of years in 
college, by colleges within a university, by social groups (other than 
in Phi Beta Kappa and the Athletic Department). We need to know 
how often to give examinations and of what form, content, and 
difficulty. 


SUMMARY OF THE USES OF INTELLIGENCE TEsSTs BY 19 ovT or 51 
Onto COLLEGES IN 1923-24 


1. The first five questions only of the questionnaire were sent to 
all Ohio colleges in order to obtain a complete sampling of one geo- 
graphical district. 

2. Nineteen colleges of the 51 used tests in 1923-24. At least 7 
more are known to have adopted tests in 1924-25, thus bringing 
the percentage up to over 50 per cent of all colleges of the State. 

3. At the present time 21 members of the 28 colleges belonging 
to the Ohio College Association use tests. This suggests the desira- 
bility of cooperation in research work through the aid of the 
association. ! 

4. The growth of use of tests in the colleges of the State has been 
continuous and steady. 

5. The Army Alpha was given more frequently than any other 
test. This test is too easy. 

6. Only 5 out of 19 colleges report using the tests for entrance 
purposes. 

7. Each college, on the average, uses its tests for 4.7 separate 
purposes. This frequency is about the same as that noted in the 
national questionnaire. 

8. Eight colleges out of 19 (42 per cent) report using the tests for 
sectioning purposes. As in the case of the national questionnaire the 
department of most frequent sectioning was English. 

9. In general, the results noted do not differ from the correspond- 
ing results in the national questionnaire except to indicate that 
probably the percentage (60) of colleges using tests found in the 
national questionnaire is probably too high. A safer national estimate 
would be to say that in 1924—25 probably slightly less than half of 
all colleges and universities in the country now use tests. This solves 
the general question of the extent of use of tests. 


1The Ohio College Association at its meeting April 4, 1925 voted to approve 
“the general adoption by its member colleges of a uniform intelligence test for 
research purposes.” 
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10. As to the general uses made of tests, all results indicate that 


they are generally thought of as an administrative and pedagogical 
device not so much for use in keeping out undesirable students 
(although considerably used for that purpose) but rather for helping 
to make better adjustments to those who are already in the college. 
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STUDIES OF ACHIEVEMENT TESTS 
BEN D. WOOD 


Columbia College, Columbia University 


(Continued from January Issue) 
Part II 


The argument of validity vs. reliability may easily be carried over 
to the problem of the relative merits of various forms of questions. 
Toops! and Ruch? have done notable work on this question. These 
investigators selected sets of 50 and 100 items of a general information 
character which they adapted to three and five question-forms: Recall, 
five-answer, three-answer, two-answer and true-false. These forms of 
the same items were then administered to students in such a way as to 
equalize and minimize practice effects and otherwise preserve compara- 
bility of results and relationships. Ruch’s general conclusion was that 
as to reliability per unit of time, the recall, five-response and two- 
response forms are somewhat superior to the 7-F and three-response 
forms. Ruch thus gets results at variance with those of Toops, who 
found the T-F somewhat better than the recall and five-response forms. 
It is quite likely that this disagreement is explained by the higher 
intellectual range concerned in Toops’ study and by the somewhat 
broader and more general character of his information items. In 
other words, the differences may be due very largely to differences in 
what was measured, the reliability differences being incidental and 
unimportant relatively to validity differences. We have already 
noted that 7-F questions of a problematic character seem to behave 
somewhat differently from those which approach more nearly to the 
simple ‘‘know or don’t know” information type. It is well, therefore, 
to keep in mind the conditions of Toops’ and Ruch’s experiments, as 
clearly set forth by them. Both dealt with factual types of items, 
but Toops’ items were spread over a much broader field, from music 
and astronomy to stock market quotations; Kuch dealt with high 
school pupils and Toops with college graduates. The present study is 





1Toops: Grade Tests in Education. Teachers College, Columbia University, 
1921: Contributions to Education No. 115. 
2Ruch: “Improvement of the Written Examination.” Scott Foresman and 
Co., 1925. 3 
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based on data from tests and subjects which have little in common 
with those used by Toops and Ruch. These differences seem funda- 
mental, and it is therefore doubtful whether we should think of the 
results of any of these studies as being in conflict; it seems more likeiy 
that each study is an approximate description of the behavior of 
particular sets of questions administered to particular groups of sub- 
jects under particular conditions. 

Just as it is important that we should not overestimate the impor- 
tance of reliability as such, so is it important that we should not think 
of the external form of the questions in a test as the most important, 
much less as the sole, determiner of the value of the test. Both Toops 
and Ruch, in their careful descriptions of the conditions and in the 
illustrations of the character of their test items, guarded against 
generalizing about the whole family of questions which masquerade 
under the 7-F cloak, or under the multiple choice or recall forms, as 
well as against over-generalizing about the behavior of a specific set of 
items on the basis of its behavior with a particular intellectual or 
educational class of subjects. 

The essential character of a question, and consequently its “‘statisti- 
cal behavior” depends upon its content as well as upon its external 
form. The content of a question is often very subtle; many of the 
“simplest”? questions are found on inspection to be compounds of 
many subtleties. It seems quite reasonable to suppose that within the 
limits of accepted forms of questions, the “inner” nature of a question 
is more important than its “outer” form; although as a matter of 
common sense, one may say that the “outer” aspect exerts more influence 
as we go down the intellectual or educational ladder and less as we 
ascend it. : 

The fact that entirely different kinds of questions may have the 
same external form becomes clear in a comparison of some of Toops’ 
and Ruch’s true-false questions with some from the examinations used 
in the present study and from other T-F problem tests. 


I. General Historical Information (Ruch) 


The American Revolution began in 1775. 

The Civil War ended in 1867. 

Martin Luther was a reformer. 

The War of 1812 was between France and the United States. 
Bismarck’s chief work was the unification of Germany. 
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II. General Information (Toops) 
The letter G is the note which is on the bottom line of the staff in music. 


. The smallest fraction of a cent in which stocks are quoted is 9. 


It takes light eight minutes to travel from the sun to the earth. 
The statue of ‘‘The Thinker” was made by Bartholdi. 
The nationality of William of Orange was Dutch. 


III. Law of Real Property (Powell) 


Before the Statute De Donis A conveyed by feoffment with livery of seisen to B 
and the heirs of his body. Bhadason,C. B died leaving C as his sole 
issue. C later died without issue. A is entitled to have the property. 

The Statute Quia Emptores facilitated the alienation of land 
(Facts for 111-112): A to B and the heirs of his body. At the present 
time in Massachusetts and other jurisdictions according like legal conse- 
quences to this limitation. 

On the extinction of B’s line, if no conveyance of the property has been made, 
the property reverts to A or his heirs. 

On B’s death, all of B’s children divide the property equally between them. 

If the wife was actually seised of any estate of inheritance, at common law, 
the husband always became entitled to courtesy initiate upon the birth alive 
of the first child of the union. 


. The rule recognized in Taltarum’s case increased the alienabilit) of fees tail. 


IV. Physics Problems (Farwell) 


. Power is the ability to do work. 
. The mechanical advantage of a machine is the ratio of the work produced to 


the work put in. 

(Problem for statements 41-48: Two boys are on a see-saw, a plank which 
has been set so that its center of gravity is over the support. One 
boy, A, weighs 100 lbs. and sits 9 feet from the support. The other boy, 
B, balances A by sitting 12 feet from the support.) 

Taking the support as an axis, the moment of force due to A’s weight is equal 
numerically to that due to B’s weight. 

The forces exerted on the plank by the two boys must be equal. 

B weighs 4% of A’s weight. 

If the supporting edge is taken as the axis, the moment of force due to the 
support is zero. 

If one boy moves 12 inches nearer to the support the other will need to move 
the same distance in order to maintain the balance. 


VY. Plane Geometry (Hawkes-Wood) 


1, If the four sides of one quadrilateral are equal respectively to the four sides 


of another, the quadrilaterals are congruent. 


2. If any angle of an isosceles triangle is 60°, the triangle is equilateral. 
3. If the diagonals in a quadrilateral meet at right angles, the figure is a 


parallelogram. 
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4. If two circles intersect, the line of centers is bisected at right angles by their 
common chord. 

. If two equal or unequal circles are tangent externally, two tangents drawn to 
them from any point in their common internal tangent are equal. 

6. A line intersects two sides of a triangle. The lengths of the four corresponding 
segments formed are respectively 10, 6, 18 and 9. The intersecting line is 
parallel to the third side of the triangle. 

. A line cutting two sides and parallel to the bases of a trapezoid divides it into 
similar trapezoids. 


or 
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These groups of questions, in spite of their external similarities, are 
quite different, not merely as to specific content, but as to the quantity 
and quality and variety of intellectual opportunities and difficulties 
which they present The differences here displayed between groups of 
questions, all of which are within the T-F type, are large enough as 
compared with the differences between two forms of the same 100 


items, to constitute a serious warning against generalizing about 


external type-forms without taking strict account of the content, 
internal natures and levels of difficulty and complexity of the questions. 
It seems wise to emphasize this warning because some readers have not 
sufficiently: appreciated the carefully described conditions of Toops’ 
and Ruch’s experiments, to save them from making inferences from 
these two studies to which neither of their authors would for a moment 
subscribe. 

We may display the same kinds of differences within the two-answer 
and within the three-answer, forms. 


Two-response Samples 


1. The American Revolution began in: 1762 1775 

2. The instillation of a few drops of a 2 per cent solution of cocaine hydrochloride 
into the conjunctival sac of the normal eye of the rabbit causes (dilatation, 
contraction) of the pupil. The (mydriatic, myopic) effect is brought about 
by an action on the (nerve ends, myoneural junctions) of the (parasympa- 
thetic, sympathetic) division of the autonomic nervous system, which inner- 
vates the (sphincter, dilator) muscle of the iris. The reaction to!light. is 
(lost, retained). Accommodation (is, is not) paralyzed. 


Three-response Samples 


1. The Civil War ended in 1861 1865 1869 

2. Directions: Tabulate the effects of the drugs heading the several columns 
on the state or condition of the yunctional activity of the structures, organs, 
tissues, etc., listed below. If effective therapeutic doses, administered in the 
ideal way, increase the functional activity, use ‘ +;” if they decrease functional 
activity use ‘“‘—;” if the drug has no effect on functional activity write ‘‘0.’’ 
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Complete Column 1, as far as you can, before beginning column 2; then 2, 
then 3, etc. 





Epi- Pilo- Atro- Ni- Stro- 
phrine | carpine | pine trites | phanthin 





Splanchnic arteries........... | 
Splanchnic veins............. 
Mammalian heart, in situ, 
nerves intact.............. | 
| | 

| 











Mammalian heart, in situ, 
denervated................ 
Coronary blood vessels....... 











— 


Illustrations of this sort might be multiplied indefinitely, if the 
case required them and space permitted; but these are quite sufficient 
to make it clear that ‘“‘true-false”’ and ‘‘two-answer” types, for ex- 
ample, do not mean types of questions which are so restricted that we 
can safely generalize about them when used for testing certain types 
of information in certain ranges of intelligence and education. 

Another reason for emphasizing this point is the fact that objective 
forms of questions have been too much associated with simple informa- 
tion inventories. It is perfectly legitimate to use objective questions as 
information inventories, just as the eliciting of pure information items 
on the old type essay examinations is perfectly legitimate; but we must 
cease to regard the idea “‘objective test’”’ as synonymous with the idea 
‘information test,”’ just as we must cease to regard the old type essay 
examination as ipso facto a reasoning test. There is some evidence to 
the effect that such values in some old type tests as do escape the dis- 
tortion of subjective scoring are due to the fact that they do measure 
information and do not measure reasoning ability. No invidious com- 
parison between information and reasoning tests is intended here. 
On the contrary it is the opinion of the writer that both teachers and 
laymen have placed ‘‘memory” and “‘reasoning”’ in a false hierarchical 
relationship, as far as testing devices are concerned. The important 
thing to realize here is that objective questions arrange themselves 
within each external form in a hierarchy of difficulty and complexity 
just as the old type tests do; and that the differences in the statistical 
behavior of questions in the same external form may be as great as, 
or greater than, the differences between questions in different external 
forms. 
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A question involving only a simple fact of information, may, with 
ingenuity, be put into any one of a dozen or so external forms, not 
counting variations in vocabulary. But with more involved and 
problematic kinds of questions, and even with some very simple infor- 
mation items, the transformations can be accomplished only at the 
expense of varying degrees of mutilation of the essential nature of the 
question. Thus, while Ruch and Toops had little difficulty in adapt- 
ing 100 and 50 information items, respectively, to several external 
forms, it would certainly be no easy task to parallel their experiment 
in the field of, let us say, law or physics or geometry, starting with 
the questions already reproduced here. Questions in these subject- 
matters can be framed in several external forms, but it is at least a 
matter of doubt whether any one of these questions, once framed, can 
be exactly transcribed into another form. Even among Ruch’s trans- 
formations it is, in some cases, uncertain whether the question or its 
form has been changed most. It is certainly not my desire to split 
hairs or be speciously critical, but the writer is convinced that we need 
to pay more attention to the subtleties of questions in order to improve 
the validity of our tests. We must try to make sure what we are meas- 
uring as well as to increase the reliability of our measuring. In chang- 
ing the external form a little we may unwittingly change its internal 
constitution a gréat deal, by addition or subtraction or substitution of 
elements, so that we have, not an old question in a new form, but a 
new question having little in common with its supposed ancestor. A 
bond. analysis of various metamorphoses of the ‘‘same item” might 
reveal differences which would far outweigh the differences in the 
external forms, as such. ) 

We may here consider the metamorphoses of selected items from 
the tests used by Toops and Ruch in their experiments. There is no 
question here of criticising the work of these careful scholars. Items 
from their lists are chosen because no other metamorphoses are avail- 
able, and because if genuine metamorphoses are at all possible, few 
would come nearer to making them than these investigators. The 
writer’s purpose being to illustrate concretely what is offered as a 
constructive suggestion, he has chosen items from their lists which 
lend themselves most effectively to what might otherwise be con- 
sidered destructive criticism. For the same reason the writer omits 
reference to all but the five-response form of the multiple-choice forms, 
and confines himself to Recall, five-response and T-F forms. 
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First Sample Item: Martin Luther 


1. The reformer chiefly responsible for the reformation in Europe was...... 
2. Martin Luther was a missionary, pope, philosopher, soldier, reformer. 
3. Martin Luther was a reformer, (true) (false). 


No. 1 involves mainly the identification of the phrase (a) “‘reforma- 
tion in Europe” and the selection of one in the class (b) ‘‘reformers’”’ 
who was (c) “chiefly responsible for” (a). If (a) is “correctly” 
identified (there have been several movements in Europe which might 
be called Reformations), the student has to choose among several 
sixteenth Century personages in the class (6b). (6) itself presents 
special difficulties: if it is used in the technical sense of one who par- 
ticipated in that variously appraised convulsion which Europe sur- 
vived only at the cost of much destruction of life and property and 
enduring hatreds, then the student must choose from among such as 
Luther, Calvin, Henry VIII, Loyola, and others, the one most respon- 
sible. If ‘‘reformer” carries with it any sense of real constructive- 
ness, untainted by great destructiveness and base motives, we should 
at least have to exclude Henry VIII, and perhaps even Luther, in 
view of the bloody wars and cruelties which he helped to instigate. 
After all these difficulties are met, the training due to church affiliation 
of the student might determine which of the four persons named would 
be considered chiefly responsible. 

In No. 2 we have a very different situation, very much less com- 
plicated. We have one person to consider, Luther; and we have only 
to consider which of five words best fits him. The reformation does 
not appear in this question, nor does the idea of chief responsibility 
occur. But the fact that Luther is the sole cue presented does not 
avoid a very delicate problem for the student. It is clear that, in 
a very true sense, Luther belongs in at least four of the five categories 
named. Unless ‘‘reformer” is interpreted by the student to mean one 
who took a great part in the sixteenth century politico-ecclesiastical 
revolution in Europe, the student who knows most about Luther would 
almost certainly answer “wrongly.”” Even if this interpretation 
is made, there is still something to be said for ‘‘ missionary.” 

In No. 3 the matter is still more simple; there is no reference to 
reformation or chief responsibility; the whole matter hangs on two 


pegs, Luther and reformer. Here the matter of interpreting reformer 


becomes acute, because it is on a different plane from that which the 
word occupies in the two first forms. In No. 3 there is not the slightest 
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suggestion that the word is used in the technical sense of one who had 
an active part in the Reformation of 1520-1570 A. D. In Nos. 1 and 
2 the words reformation and missionary and pope at least suggest an 
ecclesiastical setting for the word reformer. In No. 3 the student 
would be perfectly justified in taking the word in its ordinary meaning, 
and also in answering the question as false on the basis of Luther’s 
intentions, his intolerance, and the dreadful immediate effects of his 
revolt from the Roman Church and his new political affiliations. 
Indeed, the best informed and most thoughtful student would be the 
very one who would assume that the real point of the question (No. 
3) was to distinguish between the two senses of the word, and to com- 
mit himself on the very debatable question whether Luther was really 
a reformer or whether he was merely a Reformer. 


Second Sample Item 


4. The man who betrayed Jesus was .................5: 
5. Jesus was betrayed by Herod, Judas, Pilate, John, Lazarus. 
6. Jesus was betrayed by Pilate (true) (false). 


In No. 4 Judas, Peter and Pilate would each be correct, but due 
to the general tradition the student would probably not think of the 
weakness of Peter or of the horrible crime of Pilate in betraying a 
man in whom he found no evil to the anger of a blood-thirsty mob; 
even if the student did think of all three betrayers, he would in all 
probability select Judas. 

In No. 5, however, the problem of choosing between Judas and 
Pilate seems to be the main point. Since the two are in close juxta- 
position, the intelligent student would be stimulated to review in his 
mind the relative enormities of the crimes of Judas and Pilate, and if 
well enough informed to recall the triple character of Pilate’s betrayal 
and the remorse of poor Judas, he would underscore Pilate. 

In No. 6, there is no choice; we have only the simple question, 
Was Jesus betrayed by Pilate? It is not a question of the worst or most 
important or least unsung or most condonable of those who denied or 


sold Him for the sake of social approval or political or economic advan- | 


tage. The question is simply, Did Pilate’s conduct constitute a 
betrayal, did he sacrifice a just and guiltless man for a reward? 


Third Sample Item 
7. The German Chancellor largely responsible for the unification of Germany 


8. Bismarck’s chief work was the unification of Germany, establishment of schools, 
establishment of freedom, a great navy, industrial development. 
9. Bismarck’s chief work as the unification of Germany (true) (false). 
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In No. 7 the student must identify the unification of Germany and 
the one in the class German Chancellors who was largely responsible; 
the question is a clear-cut information item, in which there is little 
room for judgment or reasoning or balancing of rival claims. 

In No. 8 we have several points of information and a rather difficult 
judgment to make; just what did Bismarck have to do with schools, 
navy, freedom, industry and unification? These points having been 
determined, which constituted his CHIEF work? What are the 
criteria for determining the chief work of a man? Permanence? 
Fame? Achieving something which would have soon occurred in 
any case? or something which would never have come about without 
his efforts? 

In No. 9 we have the same fundamental judgment to make, as in 
No. 8, the main difference being that no rival works are positively 
suggested. No. 8 and No. 9 are very nearly alike, but very different 
from 7. 


Fourth Sample Item 


10. The American Revolution began in the year ......... 
11. The American Revolution began in 1762, 1775, 1783, 1789, 1812. 
12. The American Revolution began in 1775 (true) (false). 


Nos. 10, 11 and 12 all begin with ‘‘The American Revolution,” but 
each one contains different suggestions as to how the phrase ‘‘ American 
Revolution” is to be interpreted. It may mean the war of the Ameri- 
can Revolution, in which case the year of the first military engagement 
would be the correct answer; this interpretation is suggested in No. 10 
by the phrase “‘ began in the year.”” On the other hand, good students 
learn that the affair of the embattled farmers was but a symptom of 
a movement that began long before; and if American Revolution means 
this movement, rather than the war which the movement produced, 
then 1775 would be a wrong answer. In 11, aside from the slight 
catch in 1812, the choice narrows to 1762 and 1775; the best students 
would answer with the earlier date oftener than the poor students. In 
No. 12 no suggestion is offered tending to make American Revolution 
mean the Revolutionary War; the good student will rather take this 
as a challenge to his historical insight and will not be misled by 1775; 
he will consider rather the question whethet any large movement of 
this sort can be said to begin at any definite date, and would probably 
mark any date more definite than a quarter-century or decade as wrong. 
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Fifth Sample Item 


13. What letter designates the note on the bottom line of the staff in music?...... 

14. What letter designates the note on the bottom of the staff in music? Ans: 
A, E, G, B, C. 

15. The letter G is the note which is on the bottom line of the staff in music 
(true) (false). 


In No. 13 the student has no suggestion as to the rival claims of the 
various clefs, and will feel no hesitancy probably in answering accord- 
ing to the first clef that enters his mind; sopranos would probably 
answer most often with #, and others with G; persons trained in music 
would probably put two or three answers, all of which would be correct. 

In No. 14 we have two clefs definitely placed in rivalry, and also the 
question of the interpretation of the phrase bottom of the staff. Is the 
note on the space under the bottom line on the bottom of the staff or not? 
Of the five alternatives, E is correct for the treble clef, and G for the 
bass clef. 

In No. 15 we have a statement which is true for the bass clef, but 
false for the other commonly used clefs. 


Sixth Sample Item 


16. How many minytes does it take light to travel from the sun to the earth?...... 

17. How many minutes does it take light to travel from the sun to the earth? 
Ans: 4, 8, 12, 16, 20. 

18. It takes light 8 minutes to travel from the sun to the earth (true) (false). 


In No. 16, since no time of the year is given, and no suggestion is 
given that the approximate average is what is desired, the student who 
knows his astronomy will answer “between 7 and 9 minutes with 
average about 8.’”’ That is, the recall form here presents a real prob- 
lem to the enlightened mind, and permits such a mind to indicate 
the fact. It is not a simple information item, as it undoubtedly is in 
No. 17. In No. 17 there is no escape from the five choices presented, 
and 8 being the best choice, the student who has more than a simple 
memory must nevertheless anwer in the same way as the student who 
has only a vague recollection that 8 sounds more reasonable than 
4 or 12. 

In 18, since no suggestion is given that an approximate average is 
desired, the best informed student will be inclined to interpret the 
statement rigidly and will mark the statement as false on two grounds, 
first that nothing is said about the average time required, and second, 
if average 1s meant 8 minutes is not exactly correct; while the dull 
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and careless student, who has at some time or other heard that at 
certain times of the year the eclipses of Jupiter’s moons appear to be 
delayed about 16 minutes, will mark the statement as true. 

These samples are sufficient to illustrate concretely the need for 
considering carefully what each question asks as well as what external 
form it takes, and also the need for careful initial construction of 
questions, whatever their external forms may be. It is quite likely 
that each of the accepted forms has unique values, and that therefore 
all forms are capable of great utility if we only knew their unique 
traits. If we should make a complete list of all the valid questions 
that might be asked in a given field of information and thought, we 
would probably find that each question by its individual internal 
constitution would dictate the form in which it should be presented 
to the examinee. Such a list would in all probability call for every 
known form of question. The important thing to observe is that no 
question should be forced into a form at the expense of mutilation 
which destroys its validity, or which lessens its validity without 
balancing compensations. Thus, the true-false form, item for item, 
seems by all the evidence to possess less reliability and less validity 
than several other objective forms; but it has compensations in terms 
of sampling and scorability which apparently place it on a parity 
with the recall form when “reliabiltiy and validity per unit of examina- 
tion time” is the criterion of excellence. Certain types of simple 
information items, and some more complex, may fit into two or three 
forms equally well. The rule here would seem to be to choose that 
form which Toops and Ruch and others have shown to be most econo- 
mical per unit of examination time, and most convenient, all things 
considered. 

The reader will mistake the point of this argument if he takes 
anything that has been said as questioning the value of the studies of 
the comparative reliabilities of various forms of examination questions. 
We need more research along the lines in which Toops and Ruch have 
been pioneers. We need experimental evidence on both comparative 
reliabilities and comparative validities (a) of various forms of ques- 
tions at each of several levels of difficulty and complexity, and (6) of 
questions at various levels in the same external form. 

Under (a) it would be desirable to parallel Toops’ and Ruch’s 
experiments at various levels above the simple information plane. 
As indicated above, it might be difficult or impossible to frame the same 
item at high levels of complexity in many forms, but this would not be 
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greatly vitiating so long as we have proper regard for validity. Indeed 
if we are mainly interested in validity per unit of examination time, there 
is no necessity for trying rigidly to make the various forms of an item 
identical in content, at any level of difficulty and complexity. We 
should simply desire to find out for a given level, e.g., simple informa- 
tion, the comparative validities of, let us say, 


a one-hour recall test 

a one-hour 7-response test 
a one-hour 5-response test 
a one-hour 3-response test 
a one-hour true-false test 


when each of these one-hour tests is the best one-hour test of informa- 
tion in the given field that the form of the question in each test and the 
skill of the maker makes possible. 

We would also like to learn the comparative validities of these 
various forms of a one-hour test at other levels, e.g., in difficult and 
complex problems in physics, law, chemistry, pharmacology, etc., 
when each such one-hour test is the best problem test that can be put in 
that form of question, regardless of whether or not exactly the same 
items occur in each one-hour test. 

Limitation to“precisely the same items, question by question, in 
different forms of a one-hour test would in any case be impossible, since 
there can never be as many recall questions as true-false questions in a 
one-hour test. To impose rigorously the conditions of identity of 
elements in such experimentation would rob the true-false form of 
one of its main powers—breadth of sampling made possible by the 
speed with which students mark true-false statements; and would 
probably also limit the functioning of whatever unique subtleties 
inhere in the several other forms of questions. 

Similarly, under (b), we should like to learn the comparative validi- 
ties in terms of specific criteria, of various kinds and levels of true-false 
questions and of various kinds of recall questions, and of various kinds 
of multiple choice questions, etc. For example, what are the relative 
validities of 


a one-hour true-false test in history made up wholly of difficult judgments of 
characters, causes, effects, tendencies, relationships, moral appraisals and 
philosophical implications, etc.; and 

a one-hour true-false test made up wholly of simple factual items of historical 
information? 
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What are the relative validities of a one-hour true-false test of legal 
information, and a one-hour true-false test in legal problems? What 
are the relative validities in terms of specific criteria of a one-hour 
recall test of information about plane geometry, a one-hour recall test 
of computation in plane geometry, and a one-hour recall test of original 
propositions and problems in plane geometry? 

Further, an order of merit of forms of questions, or of different kinds 
of questions within a single form, determined in one field of knowledge 
or thought, with a given group or class of examinees may not, and 
probably will not, be the same for another field of knowledge or thought 
or for an older or younger, or more or less intelligent, class of examinees. 
There is need therefore for research of this sort in various subject- 
matters and at various intellectual and educational levels. 

We have already suggested the principle that a question which 
has once been decided upon as being valid and worth asking should 
not be forced into a form so uncongenial as to notably lessen its validity 
—this both to save good questions and to protect ‘‘forms” from 
unnecessary restrictions. 

In testing certain functions or abilities or skills the problem of 
choosing from among several forms of questions may never arise, 
simply because it may be actually impossible to ask a given question 
in more than one way. It may be, and perhaps often is, possible 
to ask another question in another form which is just as valid; and 
in such cases administrative convenience and individual skill in 
making up questions in a given form may very largely dictate the 
form of questions in a particuar examination. But if a particular 
question inherently belongs in a particular form, it ought, except for 
reasons of weight, to be given in that form. Let us consider an 
example: 


In the equation z/y = z, 2, y and z are positive integers and z remains 
constant; if y becomes larger, does z become larger or smaller? 


This question involves an absolute dichotomy, and as such is 
essentially a true-false' yes-no, or two-answer type of question, and 
not a recall or multiple choice type. (Of course, for very low-grade 
algebra students the third possibility of ‘‘no change in z” might put 





1 The writer here enters a plea of Caveat Emptor; he is aware that liberties are 
saken in the matter of definitions of terms, and that several implications of these 
paragraphs will bear discussion. Some of these will be considered in a forthcoming 
tudy. 
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the question in the three-answer type; but even this would not make it 


_4@ recall or multiple choice question.) To put a question to which 


& 


there are only two possible answers in, let us say, the recall form, 
where the student has to write out “larger” or “smaller” or “no 
change,”’ is merely wasting examination time in irrelevant activities 
and putting an additional uncompensated burden on the scoring clerk. 

It is obvious that this question could easily be made over into a 
genuine recall or multiple choice or discussion question, by leaving out 
the statement that x remains constant and by further leaving out or 
modifying the limitation that all three are positive integers; but we 
would have entirely different questions, much more difficult and 
complex. 

In passing we may notice that the question as presented above is a 
two-response (or three-response) question in a way in which a genuine 
recall item is not a two- or three-response question; e.g., 


The victor at Marengo was Moreau, Napoleon, Lafayette. 


To make this question a two-response question similar to the two- 
response form of ‘“‘z becomes larger-smaller’”’ we should have to make 
the alternatives ‘‘(1) Napoleon, (2) The General in command of the 
opposing army at Marengo.” 

Examples of.genuine three-response questions have been given 
in this article. These are enough to illustrate in a very general way 
the principle in question, and to indicate the great need for a thorough- 
going and comprehensive analysis of the whole problem of question- 
making for examination purposes. Such an analysis is being made 
by one of the writer’s associates, Mr. Charles Weidemann. 

In conclusion, we must not allow these technical considerations to 
obscure the one outstanding result of recent research in the field of 
achievement tests: that ‘“‘new type” tests are much more reliable and 
valid than old type examinations. With his usual perspective, Ruch 
has emphasized this fact in his recent book, already cited. Toops 
and Ruch have both emphasized the fact that judgments as to the 
relative merits of question-forms must take account of the time factor, 
i.e., efficiencies must be compared “per unit of examination time”’ 
as well as ‘‘per number of questions used.”” This factor has apparently 
been overlooked by some scholars, notably by those who denounce the 
true-false form on the basis of the statistical behavior of true-false or 
opposites tests of only 50 items. These scholars have also erred in 
emphasizing the difference between true-false reliability and perfect 








it 
ch 


no 
les 
rk. 
ya 
ut 


we 
nd 


$a 
ine 


vO- 
uke 
the 


yen 
ray 
gh- 
on- 
ade 


| of 
und 
uch 
ops 
the 
tor, 
ne 
itly 
the 
e or 
1 in 
fect 


Studies of Achivement Tests 139 


reliability, instead of comparing true-false reliability with that of its 
rival forms. 

The present study emphasizes validity as the fundamental desider- 
atum in test construction. We have found, in the case of the law 
examinations, an opposition between reliability and validity. Our 
data lead us to believe that, in true-false tests of the kind illustrated 
and administered to the grade of students involved in our experiment, 
the students should be advised not to guess, that the score should be 
R-W, and that the students should know that the score is R-W. 
We have given some experimental evidence (Charts 5-9) and two a 
priort arguments in favor of R-W as against Number Right scores. 
We leave the tentative inference that these rules are also valid for 
other similar types of questions in various levels of difficulty and com- 
plexity and for other intelligence and educational classes of examinees 
to be verified or disestablished by further appropriate experimenta- 
tion. We suggest the need for experimental verification of our tradi- 
tional assumptions regarding the absence of chance and ‘‘guessing”’ 
in recall and other free-answer types of question. 

All our data, in common with those of other investigators, show the 
crucial need for increasing the validity and reliability of our educational 
measurements. Even our best tests afford only approximately accur- 
ate placement of individual students. The needed improvement will 
be hastened by more care in the construction of individual questions, 
by drastically lengthening our examinations, and by using a greater 
variety of appropriate question-forms in them. 


(To Be Concluded in the March Issue) 
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THE CHARACTERISTICS OF GIFTED INDIVIDUALS 


Genetic Studies of Genius. Vol. 1 by Lewis M. Terman and others, 
Stanford University Press, 1925. Pp. [X + 648. 


The Genetic Studies of Genius by Terman and his able assistants 
seem so important in aim and so sound in both induction and deduction 
that they merit in a reviewer such support as will increase their renown 
and influence. No more pragmatically significant study is often 
attempted; to increase the number and to improve the education of 
the best portions of our race will affect every human activity from the 
control of the earth’s surface and weather, to the self-purposing evolu- 
tion of the human stock and education of man. In the event of large 
success, nothing human could escape such an influence. 

The first volume is notable in a number of ways. It attacks a 
problem which is logically and practically anterior to most others: 
before the desiderata, the tools and methods; before the tools and 
methods, the tool- and method-makers—the identification, genetic 
selection and education of the best of the race. It combines sound 
mathematical technology (the name of Truman L. Kelley is guarantee) 
with straightforward, intelligible exposition. Various statistical 
formula, not yet all sufficiently current, are so employed and so illus- 
trated as to promise wider use; entire distributions, both of frequencies 
and of correlations, are presented in astounding topical range; and all 
this wealth of data and'analysis is set forth by means of a simple, effec- 
tive organization of the book into chapters and of each chapter into 
text, tables, figures and summary. There are many interpretations, 
and many hypotheses for future research. As good science simply 
told it augments the momentum for popularizing science while as 
Genetic Studies of Genius it improves man’s native and acquired capaci- 
ties to make and to understand good science. 

The results thus far bring new quantitative support to a number of 
conclusions which have often been debated by schools of thought: 
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Correlations between desirable traits are positive. 

Intelligence and character are mutually implied. 

Gifted children are usually normal or better in health, in physique 
and in play. 

Intelligence is hereditary. Geniuses have geniuses as relatives 
more frequently than do others. 

The Stanford-Binet IQ is constant. 

Individual psychograms show special abilities and disabilities. 

The average school achievement of gifted children is between that 
of their age or grade and that of the potentiality. 

It is but natural to congratulate the group of children and parents 
upon the unique privilege, as well as the Commonwealth Fund and 
the Thomas Welton Stanford Fund upon the unique promise, of the 
Genetic Studies of Genius. 

Clarapede has reviewed this volume—the purpose and conditions 
briefly and the results and conclusions more at length—in Archives 
de Psychologie, June 1925, page 266. Joun P. HERRING. 


PsycHoLoGy APPLIED TO THE NEEDS OF NURSING 


Psychology for Nurses, by Maude B. Muse. W.B. Saunders Company, 
Philadelphia and London, 1925. Pp. 339. 


This book and the contribution it makes are welcomed as the first 
major contribution of its author. While it makes no pretense of 
offering results of original psychologial research in the field of nursing, 
its value and significance are, perhaps, just as great. She has accom- 
plished a very difficult task, that of thinking concretely and of adapt- 
ing, illustrating, and applying in an elementary form known psycholog- 
ical laws to nursing situations. This task has been done admirably. 
The product primarily possesses vocational rather than academic 
value. Though current comment stresses the need for contributions 
in pure science, adaptation of that science for immediate use for 
carrying on the world’s work is fully as important. The whole field of 
applied psychology illustrates attempts in this single field. Great 
as these contributions have been they seem rather insignificant, for 
there is so much yet to be done. For example, it is now trite to say 
that much of our so-called educational psychology is nearly as academic 
as the mother science herself. And so a successful, though elemen- 
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tary, presentation of psychology from the viewpoint of the needs and 
interest of nurses is welcomed. 

While the book will be valuable for general reading by nurses and 
as a reference book for courses in nursing education, it is intended pri- 
marily to serve as an elementary text. For such its organization has 
been well planned. Treatment of practically all the major topics in 
educational psychology has been included. Chapters are preceded by 
brief outlines and followed by very satisfactory bibliographies and 
experiments, together with unusually varied and good test exercises. 
The style is attractive, simple, definite, and clear; it is the work of a 
capable writer, wholly at home in her field of nursing education. As 
an elementary text it surpasses the relatively few others in its field 
and as such will be valuable and immediately popular, so much so that 
the reviewer hopes that the author, as well as students of her book, will 
feel that the success of her first book justifies a second, and then, 
many more! Epwin Mavrice BaILor. 

Dartmouth College. 


MISCELLANEOUS Essays ON PSYCHOLOGICAL TOoPICs 


Old and New Viewpoints in Psychology, by Knight Dunlap. St. Louis: 
The C. V. Mosby Company, 1925. Pp. 166. 


This volume contains five lectures delivered by the author to more 
or less popular audiences. One of them has already been published in 
the Scientific Monthly. The topics are diverse, and there is, therefore, 
no logical unity in the book. The author is clear and entertaining; 
indeed, he is at his best in dealing with such topics as spiritualism, 
character analysis and Freudianism. He evidently knows these 
topics well and he answers excellently those questions which the lay- 
man is continually propounding to the psychologist with the expecta- 
tion that he can explain these supposedly mysterious phenomena. 

The first essay is on mental measurement and, to the reviewer, 
proved to be the least interesting of the five. It is of the now familiar 
and wearisome “pit-fall’” type bristling with warnings as to the 
dangers of hasty work and rash interpretations. It would not have 
been so passé in 1923, when it was written, as it is now in 1926, but in 
the interval there have been so many dismal and doughty Jeremiahs. 

The second essay deals with present day schools of psychology, and 
is mainly concerned with a discussion and criticism of behaviorism. 
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The rise of the behavioristic movement is described and its develop- 
ment to the inclusion of ‘“‘vocal behavior” and ‘‘graphic behavior” 
is entertainingly traced. Freudianism is dealt with valiantly, without 
gloves. McDougall’s “‘instinctivism” is disposed of quickly at the 
close. 

The third essay deals with the psychological factors in spiritualism 
and discusses such topics as why certain eminent scientific men are 
found among its adherents, why there are so many so-called proofs 
and whether mediums themselves may not be honestly deceived. It 
is a well-written and interesting essay. 

The fourth essay deals with the psychology of the comic in which 
the thesis is maintained that we find amusement in what is inferior to 
us. The last essay deals with the reading of character from external 
signs. The professional character analyst, his methods and his 
chances of success are discussed. The reviewer enjoyed the last 
three essays best, and he is sure that many psychologists and educators 
will derive amusement and profit from their perusal. R. PINTNER. 

Teachers College, Columbia University. 


ELEMENTARY STATISTICS, SimpLy EXPLAINED 


A Primer of Graphics and Statistics for Teachers. By Harold Rugg. 
Boston: Houghton Mifflin Company, 1925. Pp. 142. $1.60. 


Ever since the present wide-spread interest in educational measure- 
ments began, an explanation of statistics has been needed, simple 
enough to be understood by principals and teachers without extensive 
training in mathematics and by students who, in a single course on 
Standardized Tests, have little time for learning the more elaborate 
methods of treating results. Dr. Rugg’s new Primer is the simplest 
statement that has yet been issued. In it the author has used the 
same system of careful ‘‘rationalization”’ of processes which charac- 
terized his junior high school texts in mathematics. Each new type 
of computation is introduced by a discussion showing the practical 
problems which make this type of computation necessary. Several 
illustrations or several variations of the problem make perfectly 
obvious the value and meaning of the new process. A clear explana- 
tion of the arithmetical work is then followed by a convenient summary 
of steps, for reference. 
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In harmony with the book’s informal and semi-conversational tone, 
common errors are pointed out—the student being explicitly warned 
against them—and questions on the material just covered are inter- 
spersed throughout the chapter, rather than being bunched at the end. 
The reader is told which processes he is likely to need actually to 
perform, and which he will merely need to understand for intelligent 
reading. 

One chapter is given over to the application of statistical distribu- 
tions to the giving of school marks, and another to their use in inter- 
preting test scores and as an aid in classroom exposition. An appended 
bibliography lists the best-known supplementary references under 
such numerous sub-heads as to make the location of needed informa- 
tion comparatively easy. 

The book begins and ends with graphic devices, and perhaps half 
its brief compass is given over to the drawings themselves. As the 
explanation and the charts are very closely related, the going is thus 
made considerably smoother. Since the assertion is now frequently 
made that statistics is really one of the easiest divisions of mathematics 
and is only considered difficult because of its accidental postponement 
till late in the school program, an interesting experiment would be to 
try out this new*Primer in the upper years of the elementary school. 
I believe it would function there. Its mastery by such children might 
assist the teachers, as summer school students, to that degree of self- 
confidence which, in their study of this subject, they now so conspicu- 
ously lack. Denton L. GEYER. 

Chicago Normal College. 
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